U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

A simple method to assess and report thematic saturation in qualitative research

1 Q42 Research, Research Triangle Park, North Carolina, United States of America

Emily Namey

2 Global Health, Population, and Nutrition, FHI 360, Durham, North Carolina, United States of America

Associated Data

All relevant data are within the manuscript's Supporting Information files.

Data saturation is the most commonly employed concept for estimating sample sizes in qualitative research. Over the past 20 years, scholars using both empirical research and mathematical/statistical models have made significant contributions to the question: How many qualitative interviews are enough? This body of work has advanced the evidence base for sample size estimation in qualitative inquiry during the design phase of a study, prior to data collection, but it does not provide qualitative researchers with a simple and reliable way to determine the adequacy of sample sizes during and/or after data collection. Using the principle of saturation as a foundation, we describe and validate a simple-to-apply method for assessing and reporting on saturation in the context of inductive thematic analyses. Following a review of the empirical research on data saturation and sample size estimation in qualitative research, we propose an alternative way to evaluate saturation that overcomes the shortcomings and challenges associated with existing methods identified in our review. Our approach includes three primary elements in its calculation and assessment: Base Size, Run Length, and New Information Threshold. We additionally propose a more flexible approach to reporting saturation. To validate our method, we use a bootstrapping technique on three existing thematically coded qualitative datasets generated from in-depth interviews. Results from this analysis indicate the method we propose to assess and report on saturation is feasible and congruent with findings from earlier studies.

Introduction

Data saturation is the conceptual yardstick for estimating and assessing qualitative sample sizes. During the past two decades, scholars have conducted empirical research and developed mathematical/statistical models designed to estimate the likely number of qualitative interviews needed to reach saturation for a given study. Although this body of work has advanced the evidence base for sample size estimation during the design phase of a qualitative study, it does not provide a method to determine saturation, and the adequacy of sample sizes, during and/or after data collection. As Morse pointed out more than 20 years ago, “saturation is an important component of rigor. It is present in all qualitative research but, unfortunately, it is evident mainly by declaration” [ 1 ]. In this paper we present a method to assess and report on saturation that enables qualitative researchers to speak about--and provide some evidence for--saturation that goes beyond simple declaration.

To provide the foundation for this approach, we define saturation and then review the work to date on estimating saturation and sample sizes for in-depth interviews. We follow this with an overview of the few empirically-based methods that have been put forward to operationalize and measure saturation and identify challenges of applying these approaches to real-life research contexts, particularly those that use inductive thematic analyses. We subsequently propose an alternative way of evaluating saturation and offer a relatively easy-to-use method of assessing and reporting on it during or after an inductive thematic analysis. We test and validate our method using a bootstrapping technique on three distinctly different qualitative datasets.

The method we propose is designed for qualitative data collection techniques that aim to generate narratives–i.e., focus groups and one-on-one interviews that use open-ended questioning with inductive probing (though we have only attempted to validate the method on individual interview data). Our method also specifically applies to contexts in which an inductive thematic analysis [ 2 – 4 ] is used, where emergent themes are discovered in the data and then transformed into codes.

A brief history of saturation and qualitative sample size estimation

How many qualitative interviews are enough? Across academic disciplines, and for about the past five decades, the answer to this question has usually revolved around reaching saturation [ 1 , 5 – 9 ]. The concept of saturation was first introduced into the field of qualitative research as “theoretical saturation” by Glaser and Strauss in their 1967 book The Discovery of Grounded Theory [ 10 ]. They defined the term as the point at which “no additional data are being found whereby the [researcher] can develop properties of the category” (pg. 61). Their definition was specifically intended for the practice of building and testing theoretical models using qualitative data and refers to the point at which the theoretical model being developed stabilizes. Many qualitative data analyses, however, do not use the specific grounded theory method, but rather a more general inductive thematic analysis. Over time, the broader term “data saturation” has become increasingly adopted, to reflect a wider application of the term and concept. In this broader sense, saturation is often described as the point in data collection and analysis when new incoming data produces little or no new information to address the research question [ 4 , 9 , 11 – 13 ].

Interestingly, empirical research on saturation began with efforts to determine when one might expect it to be reached. Though “interviewing until saturation” was recognized as a best practice, it was not a sufficient description of sample size. In most research contexts, sample size specification and justification is required by funders, ethics committees, and other reviewers before a study is implemented [ 14 , 15 ]. Applied qualitative researchers faced the question: How do I estimate how many interviews I’ll need before I head into the field?

Empirical research to address this issue began appearing in the literature in the early 2000s. Morgan et al. [ 16 ] conducted a pioneer methodological study using data collected on environmental risks. They found that the first five to six interviews produced the majority of new information in the dataset, and that little new information was gained as the sample size approached 20 interviews. Across four datasets, approximately 80% to 92% of all concepts identified within the dataset were noted within the first 10 interviews. Similarly, Guest et al. [ 9 ] conducted a stepwise inductive thematic analysis of 60 in-depth interviews among female sex workers in West Africa and discovered that 70% of all 114 identified themes turned up in the first six interviews, and 92% were identified within the first 12 interviews. Subsequent studies by Francis et al. and Namey et al. [ 17 , 18 ] reported similar findings. Building on these earlier studies, Hagaman and Wutich [ 19 ] calculated saturation within a cross-cultural study and found that fewer than 16 interviews were enough to reach data saturation at each of the four sites but that 20–40 interviews were necessary to identify cross-cultural meta-themes across sites.

Using a meta-analytic approach, Galvin [ 20 ] reviewed and statistically analyzed—using binomial logic—54 qualitative studies. He found the probability of identifying a concept (theme) among a sample of six individuals is greater than 99% if that concept is shared among 55% of the larger study population. Employing this same logic, Fugard and Potts [ 21 ] developed a quantitative tool to estimate sample sizes needed for thematic analyses of qualitative data. Their calculation incorporates: (1) the estimated prevalence of a theme within the population, (2) the number of desired instances of that theme, and (3) the desired power for a study. Their tool estimates, for example, that to have 80% power to detect two instances of a theme with a 10% prevalence in a population, 29 participants would be required. Note that their model assumes a random sample.

The above studies are foundational in the field of qualitative sample size estimation. They provide empirically-based guidance for approximating how many qualitative interviews might be needed for a given study and serve a role analogous to power calculations in quantitative research design (albeit in some case without the math and degree of precision). And, like power calculations, they are moot once data collection begins. Estimates are based on (specified) assumptions, and expectations regarding various elements in a particular study. As all researchers know, reality often presents surprises. Though a study may be powered to certain parameters (quantitative) or have a sample size based on empirical guidance (qualitative), after data collection is completed the resulting data may not conform to either.

Not surprisingly, researchers have recently begun asking two follow up questions about data saturation that go beyond estimation: How can we better operationalize the concept of saturation ? and How do we know if we have reached saturation ?

Operationalizing and assessing saturation

The range of empirical work on saturation in qualitative research and detail on the operationalization and assessment metrics used in data-driven studies that address saturation are summarized in Table 1 . In reviewing these studies to inform the development of our approach to assessing saturation, we identified three limitations to the broad application of saturation assessment processes which we sought to overcome: lack of comparability of metrics, reliance on probability theory or random sampling, and retrospective assessment dependent on having a fully coded/analyzed dataset. We discuss each limitation briefly before introducing our alternative approach.

Lack of comparability in metrics

Current operationalizations of saturation vary widely in the criteria used to arrive at a binary determination of saturation having been reached or not reached (e.g., Francis et al. [ 17 ] and Coenen et al. [ 22 ]). Given how different approaches are–in terms of units of analysis and strictness of saturation thresholds–it is difficult to understand how much confidence to have in a conclusion about whether saturation was reached or not. Unlike quantitative researchers using statistical analysis methods who have established options for levels of confidence intervals and other metrics to report, there are no agreed-upon metrics to help qualitative researchers interpret the strength of their saturation findings. The method we propose facilitates qualitative researchers’ choice among levels of assessment criteria along with a common description of those criteria that will allow readers to interpret conclusions regarding saturation with more or less confidence, depending on the strictness of the criteria used.

Reliance on probability theory, and/or the assumption of a random sample

Basing assessments of saturation on probabilistic assumptions (e.g., Lowe et al. [ 26 ], Fugard & Potts [ 21 ], Galvin [ 20 ]) ignores the fact that most qualitative research employs non-probabilistic, purposive sampling suited to the nature and objectives of qualitative inquiry [ 28 ]. Even in cases where random sampling is employed, the open-ended nature of qualitative inquiry doesn’t lend itself well to probability theory or statistical inference to a larger population because response categories are not structured, so are not mutually exclusive. The expression of Theme A is not necessarily to the exclusion of Theme B, nor does the absence of the expression of Theme A necessarily indicate Not-A. Further, from a logistical standpoint, many qualitative researchers do not have the expertise, nor the time required, to perform complicated statistical tests on their datasets. Our approach involves only simple arithmetic and calculation of percentages.

Retrospective assessment dependent on having a fully coded/analyzed dataset

Methods that calculate saturation based on the proportion of new themes relative to the overall number of themes in a dataset (e.g., Guest et al. [ 9 ], Hennink et al. [ 23 ]) are limited by the total number of interviews conducted: the denominator represents the total number of themes in the fully-analyzed dataset and is fixed, while the number of themes in the numerator gets closer to the denominator with every new interview considered, thus eventually reaching 100% saturation. Saturation will inevitably occur in a retrospectively-assessed, fully-analyzed, fixed-size dataset. The method we outline eliminates this problem by using a subset of data items in the denominator instead of the entire dataset, facilitating better prospective assessment of saturation and offering the advantage of allowing researchers to stop before reaching a pre-specified number of interviews. (Under our approach, however, a measure of percent saturation as defined by these authors will not be available.)

An alternative approach and method to calculating and reporting saturation

For the purposes of our assessment, saturation refers to the point during data analysis at which incoming data points (interviews) produce little or no new useful information relative to the study objectives. Our approach to operationalizing this definition of saturation consists of three distinct elements–the base size , the run length , and the relative amount of incoming new information, or the new information threshold .

When assessing saturation, incoming information is weighed against the information already obtained. Base size refers to how we circumscribe the body of information already identified in a dataset to subsequently use as a denominator (similar to Francis et al.’s initial analysis sample). In other words, what is the minimum number of data collection events (i.e., interviews) we should review/analyze to calculate the amount of information already gained ? We know that if we use all of the data collection events as our base size, we can reach saturation by default as there are no more data to consider. We also know from previous studies [ 9 , 16 , 29 ] that most novel information in a qualitative dataset is generated early in the process, and generally follows an asymptotic curve, with a relatively sharp decline in new information occurring after just a small number of data collection/analysis events. For this reason, we have chosen to test 4, 5, and 6 interviews as base sizes from which to calculate the total number of unique themes to be used in the denominator of the saturation ratio. The unit of analysis for base size is the data collection event; the items of analysis are unique codes representing themes.

A run can be defined as a set of consecutive events or observations, in this case interviews. The run length is the number of interviews within which we look for, and calculate, new information . The number of new themes found in the run defines the numerator in the saturation ratio. Hagaman and Wutich (2017) and Francis et al. (2010), for example, consider runs of three data collection events each time they (re)assess the number of new themes for the numerator, whereas Coenen et al. (2012) include only two events in their data runs. For our analyses we provide both options for run lengths in our calculations–two events and three events–to afford researchers more flexibility. Note that in our analyses, successive runs overlap: each set of interviews shifts to the right or “forward” in time by one event. Fig 1 shows the process, and how base size and run length relate to one another. Here again the unit of analysis is the data collection event; the items of analysis are unique codes.

An external file that holds a picture, illustration, etc.
Object name is pone.0232076.g001.jpg

New information threshold

Once units of analysis for the numerator and denominator are determined the proportional calculation is simple. But the next question is a purely subjective one: What level of paucity of new information should we accept as indicative of saturation? We propose that furnishing researchers with options—rather than a prescriptive threshold—is a more realistic, transparent and accurate practice. We therefore propose initially two levels of new information that represent the proportion of new information we would accept as evidence that saturation has been reached at a given point in data collection: ≤5% new information and no (0%) new information.

These new information thresholds can be used as benchmarks similar to how a p-value of <0.05 or <0.01 is used to determine whether enough evidence exists to reject a null hypothesis in statistical analysis. As in statistical analysis—but absent the probability theory—there is no guarantee that saturation is in fact reached when meeting these thresholds. But they do provide a transparent way of presenting data saturation assessments that can be subsequently interpreted by other researchers. The lower the new information threshold, the less likely an important number of themes may remain undiscovered in later interviews if data collection stops when the threshold is reached. Taken together, the concepts of base size, run length, and new information threshold allow researchers to choose how stringently they wish to apply the saturation concept–and the level of confidence they might have that data saturation was attained for a given sample ( Fig 2 ).

An external file that holds a picture, illustration, etc.
Object name is pone.0232076.g002.jpg

The advantages of the method we propose are several:

  • It does not assume or require a random sample, nor prior knowledge of theme prevalence.
  • Calculation is simple. It can be done quickly and with no statistical expertise.
  • Metrics can be used prospectively during the data collection and analysis process to ascertain when saturation is reached (and providing the possibility of conducting fewer data collection events than planned).
  • Metrics can be used retrospectively , after data collection and analysis are complete, to report on the adequacy of the sample to reach thematic saturation.
  • Options for each metric can be specified prior to analysis or reported after data analysis.
  • The metrics are flexible. Researchers have options for how they describe saturation and can also use the term with more transparency and precision.
  • Saturation is conceptualized as a relative measure. This neutralizes differences in the level of coding granularity among researchers, as the method affects both numerator and denominator.

Application of the approach

An example of prospective data saturation calculation.

Let’s consider a step-by-step example of how this process works, using a hypothetical dataset to illustrate the approach. We will prospectively calculate saturation using a base size of 4 interviews and run length of 2 interviews. For this example, we have selected a new information threshold of ≤ 5% to indicate that we have reached adequate saturation. [The data used for each step are included in Fig 3 , along with indication of the base, runs, and saturation points.]

An external file that holds a picture, illustration, etc.
Object name is pone.0232076.g003.jpg

STEP 1 –Find the number of unique themes for base

We start by looking at the first four interviews conducted and summing the number of unique themes identified within this group. The resulting sum, 37, is the denominator in our equation.

STEP 2—Find the number of unique themes for the first run

In this example, we’re using a run length of two, so include data for the next two interviews after the base set–i.e., interviews 5 and 6. After reviewing those interviews, let’s say we identified four new themes in interview 5 and three new themes in interview 6. The number of new themes in this first run is seven.

STEP 3 –Calculate the saturation ratio

Divide the number of new themes in this run (seven) by the number of unique themes in the base set (37). The quotient reveals 19% new information. This is not below our ≤5% threshold, so we continue.

STEP 4 –Find the number of new unique themes for the next run in the series

For the next run we add the new themes for the next two interviews, 6 and 7 (note the overlap of interview 6), resulting in a sum of four.

STEP 5—Update saturation ratio

Take the number of new themes in the latest run (four) and divide by the number of themes in the base set (37). This renders a quotient of 11%, still not below our ≤5% threshold. We continue to the next run.

STEP 6 –Find the number of new unique themes for the next run in the series

For this third run we add the number of new themes identified within interviews 7 and 8.

STEP 7—Update saturation ratio

Take the number of new themes in the latest run (one) divided by the number of themes in the base set (37).

At this point the proportion of new information added by the last run is below the ≤5% threshold we established, so we stop here after the 8 th interview and have a good sense that the amount of new information is diminishing to a level where we could say saturation has been reached based on our subjective metric of ≤5%. Since the last two interviews did not add substantially to the body of information collected, we would say that saturation was reached at interview 6 (each of the next two interviews were completed to see how much new information would be generated and whether this would fall below the set threshold). We would annotate these two extra interviews (indicative of run length) by appending a superscript “+2” to the interview number, to indicate a total of eight interviews were completed. In writing up our saturation assessment then, we would say that using a base size 4 we reached the ≤5% new information threshold at 6 +2 interviews.

If we wanted to be more conservative, and confident in our conclusion of reaching saturation in this example, we could adjust two parameters of our assessment. We could increase the run length to 3 (or an even larger number), and/or we could set a more stringent new information threshold of no new information. If we consider the hypothetical data set used here (see Fig 3 ) and kept the run length of 2, the 0% new information threshold would have been reached at interview 10 +2 .

One may still raise two logical questions after reviewing the example process above. The first is “How do we know that we’re not missing important information by capping our sample at n when saturation is indicated?” Put another way, if we had conducted, say, five more interviews would we have gotten additional and important data? The honest answer to this is that we don’t know, and we can never know unless we conduct those five extra interviews, and then five more after that and so on. That is where we rely on the empirical research that shows the rate at which new information emerges decreases over time and that the most common and salient themes are generated early, assuming that we keep the interview questions, sample characteristics, and other study parameters relatively consistent. To further illustrate how saturation may have been affected by doing additional interviews, we include 20 interviews in Fig 3 . The interviews following Interview 12, though yielding four additional themes, remained at or below the ≤5% new information threshold.

The second question is to a degree related to the first question and pertains to possible order effects. Would the theme identification pattern in a dataset of 20 interviews look the same if interviews #10 through #20 were conducted first? Could new themes start emerging later in the data collection process? Though it is possible an important theme will emerge later in the process/dataset, the empirical studies referenced above demonstrate that the most prevalent, high-level, themes are identified very early on in data collection, within about six interviews. But, to further check this, we use a bootstrapping technique on three actual datasets to corroborate findings from these earlier studies and to assess the distributional properties of our proposed metrics. These bootstrap findings give us information on how saturation may be reached at different stopping points as new themes are discovered in new interviews and when the interviews are ordered randomly in different replications of the sample of interviews.

Sample datasets

We selected three existing qualitative datasets to which we applied the bootstrapping method. Although the datasets were all generated from individual interviews analyzed using an inductive thematic analysis approach, the studies from which they were drawn differed with respect to study population, topics of inquiry, sample heterogeneity, interviewer, and structure of data collection instrument, as described below.

Dataset 1 . This study included 40 individual interviews with African American men in the Southeast US about their health seeking behaviors [ 29 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. The inductive thematic analysis included 11 of the 13 questions and generated 93 unique codes. The study sample was highly homogenous.

Dataset 2 . The second dataset consists of 48 individual interviews conducted with (mostly white) mothers in the Southeast US about medical risk and research during pregnancy [ 30 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. Of note, the 48 interviews were conducted, 12 each, using different modes of data collection: in-person, by video (Skype-like platform), email (asynchronous), or text chat (synchronous). The qualitative thematic analysis included 10 of these questions and generated 85 unique codes.

Dataset 3 . This study included 60 interviews with women at higher risk of HIV acquisition—30 participants in Kenya and 30 in South Africa [ 31 ]. The interview was a follow-up qualitative inquiry into women’s responses on a quantitative survey. Though there were 14 questions on the guide, only data from three questions were included in the thematic analysis referenced here. Those three questions generated 55 codes. Participants from the two sites were similar demographically with the exceptions of education and marital status. Substantially more women from the Kenya sample were married and living with their partners (63% versus 3%) and were less likely to have completed at least some secondary education. All interviews were conducted in a local language.

Data from all three studies were digitally recorded and transcribed using a transcription protocol [ 32 ]; transcripts were translated to English for Dataset 3. Transcripts were imported into NVivo [ 33 ] to facilitate coding and analysis. All three datasets were analyzed using a systematic inductive thematic approach [ 2 ], and all codes were explicitly defined in a codebook following a standard template [ 34 ]. For Datasets 1 & 2, two analysts coded each transcript independently and compared code application after each transcript. Discrepancies in code application were resolved through discussion, resulting in consensus-coded documents. For Dataset 3, two coders conducted this type of inter-coder reliability assessment on 20% of the interviews (a standard, more efficient approach than double-coding all interviews [ 2 ]). All three studies were reviewed and approved by the FHI 360 Protection of Human Subjects Committee; the study which produced Dataset 3 was also reviewed and approved by local IRBs in Kenya and South Africa.

Bootstrapping method

While these three studies offer diverse and analytically rigorous case studies, they provide limited generalizability. To approximate population-level statistics and broaden our validation exercise, we drew empirical bootstrap samples from each of the datasets described above. The bootstrap method is a resampling technique that uses the variability within a sample to estimate the sampling distribution of metrics (in this case saturation metrics) empirically [ 35 ]. This is done by randomly resampling from the sample with replacement (i.e., an item may be selected more than once in a resample) many times in a way that mimics the original sampling scheme. For each qualitative dataset, we generated 10,000 resamples from the original sample. In addition, we randomly ordered the selected transcripts in each resample to offset any order effect on how/when new codes are discovered. For each resample, we calculated the proportion of new themes found in run lengths of two or three new events relative to a base size of four, five or six interviews. We then identified the number of transcripts needed to meet a new information threshold of ≤5% or 0%. Based on these thresholds from 10,000 resamples, for each dataset we computed the median and the 5th and 95th percentiles for number of interviews required to reach each new information threshold across different base sizes and run lengths. The 5th and 95th percentiles provide a nonparametric 90% confidence interval for the number of transcripts needed to reach saturation as defined at these new information thresholds.

Since we had available the total number of codes identified in each dataset, we carried out one additional calculation as a way to provide another metric to understand how the median number of interviews to reach a new information threshold related to retrospectively-assessed degrees of saturation with the entire dataset. In this case, once the number of interviews to reach a new information threshold was determined for each run of a dataset, we divided the number of unique themes identified up to that point by the total number of unique themes. This provided a percent–or degree–of saturation for each run of the data, which was then used to generate a median and 5 th and 95 th percentile for the degree of saturation reached. This can then be compared across base sizes, run lengths, and new information thresholds. [Note that we include this as a further way to understand and validate the proposed approach for calculating saturation, rather than as part of the proposed process.]

The results from the bootstrapping analyses are presented by dataset, in Tables ​ Tables2, 2 , ​ ,3 3 and ​ and4. 4 . Each table presents median and percentiles of the bootstrap distribution using bases of 4, 5 or 6 and run lengths of 2 and 3, at new information thresholds of ≤5% and no new information.

Note that, as described in the example above, the number of interviews in the run length is not included in the number of interviews to reach the given new information threshold, so the total number of events needed to assess having reached the threshold is two or three more interviews than the given median, depending on the run length of choice. This is indicated by a superscript +2 or +3.

For Dataset 1 ( Table 2 ), at the ≤5% new information threshold, the median number of interviews needed to reach a drop-off in new information was consistent across all base sizes. At a run length of two interviews, the median number of interviews required before a drop in new information was observed was six. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 7 and 8 was less than or equal to 5% of the total. At a run length of three interviews, the median number of interviews required before a drop in new information was observed was seven. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 8, 9, and 10 was less than or equal to 5% of the total. Across base sizes, for a run length of two, we would say that saturation was indicated at 6 +2 , while for a run length of three we would say saturation was observed at 7 +3 , both at the ≤5% new information level. Using the total number of themes in the dataset retrospectively, the number of themes evident across 6–7 interviews corresponded with a median degree of saturation of 78% to 82%.

At the 0% new information threshold, the median number of interviews to indicate saturation were again consistent across bases sizes, varying only by the run length. The median number of interviews required were 11 +2 and 14 +3 . In other words, at run length 2, it took 11 interviews, plus two more to confirm that no new information was contributed. At run length 3 it was 14 interviews plus three more to confirm no new information. The number of themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

The results for Dataset 2 were nearly identical to Dataset 1 ( Table 3 ). Saturation was indicated at 6 interviews at a run length of 2 (6 +2 ) and 7–8 interviews at run length 3 (7 +3 or 8 +3 ). The number of themes evident across 6–8 interviews corresponded with a median degree of saturation of 79% to 82%. At the 0% new information threshold saturation was indicated at the same points as in Dataset 1: 11 +2 and 14 +3 , consistent across all base sizes. In other words, no new information was observed after a median of 11 interviews using a run-length of 2, nor after 14 interviews using a run length of 3. Here again, despite a different total number of themes in the overall dataset, the number of new themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

Dataset 3 ( Table 4 ) contained more variation in the sample than the others, which was reflected in a slightly higher median number of interviews and a lower degree of saturation. At the ≤5% new information threshold, the median number of interviews required to reach saturation at a run length of 2 was 8–9 (higher for base size 4). At a run length of 3, the median number of required interviews was 11–12 (again higher for base size 4). The number of new themes evident across 8–12 interviews corresponded with a median degree of saturation of 62% to 71%. At the 0% new information threshold, saturation was indicated at 12 +2 and 16 +3 , consistent across base sizes. The number of new themes evident across 12–16 interviews corresponded with a median degree of saturation of 69% to 76%.

In this paper we present a way of assessing thematic saturation in inductive analysis of qualitative interviews. We describe how this method circumvents many of the limitations associated with other ways of conceptualizing, assessing and reporting on saturation within an in-depth interview context. The process can be applied either prospectively during the data collection and analysis process or retrospectively , after data collection and analysis are complete. A key advantage is that the metrics are flexible, affording researchers the ability to choose different degrees of rigor by selecting different run lengths and/or new information thresholds. Similarly, the method allows for different options–and greater clarity and transparency–in describing and reporting on saturation.

Based on the bootstrapping analyses we can draw several conclusions. The first is that the results are within the range of what we would have expected based on previous empirical studies. Using the ≤5% new information threshold, our findings indicate that typically 6–7 interviews will capture the majority of themes in a homogenous sample (6 interviews to reach 80% saturation). Our analyses also show that at the higher end of the range for this option (95 th %ile) 11–12 interviews might be needed, tracking with existing literature indicating 12 interviews are typically needed to reach higher degrees of saturation.

We can also draw other lessons to inform application of this process:

  • Base size appears to have almost no effect on the outcome. This is important from an efficiency perspective. If our findings hold true in other contexts, it suggests that using a default base size of four interviews is sufficient. In practical terms, this implies that saturation should initially be assessed after six interviews (four in the base, and two in the run). If analyzing data in real time, the results of this initial assessment can then determine whether or not more interviews are needed.
  • Run length has an effect on the outcome, as one would expect. The longer the run length, the greater number of interviews required to reach saturation. The size of run length effect is smallest–very minimal–if employing the ≤5% new information threshold. The practical implication of this finding is that researchers can choose a longer run length–e.g., three interviews (or more)–to generate a more conservative assessment of saturation.
  • The new information threshold selected affects the point at which saturation is indicated, as one would expect. The lower the new information threshold–and therefore the more conservative the allowance for recognizing new information–the more interviews are needed to achieve saturation. From an applied standpoint this finding is important in that researchers can feel confident that choosing a more stringent new information threshold–e.g., 0%—will result in a more conservative assessment of saturation, if so desired.

There are, of course, still limitations to this approach. It was developed with applied inductive thematic analyses in mind–those for which the research is designed to answer a relatively narrow question about a specific real-world issue or problem–and the datasets used in the bootstrapping analyses were generated and analyzed within this framework. The applicability of this approach for qualitative research with a different epistemological or phenomenological perspective is yet untested. Another potential limitation of this method relates to codebook structure. When conducting an inductive thematic analysis, researchers must decide on an appropriate codebook organizational scheme (see Hennink et al. [ 23 ] for discussion on this as it relates to saturation). We tested our method on single-tier codebooks, but qualitative researchers often create hierarchical codebooks. A two-tier structure with primary (“parent”) codes and constituent secondary (“child”) codes is a common form, but researchers may also want to identify and look for higher-level, meta-themes (e.g., Hagaman and Wutich [ 19 ]). For any method of assessing saturation, including ours, researchers need to decide at which level they will identify and include themes/codes. For inductive thematic analyses this is a subjective decision that depends on the degree of coding granularity necessary for a particular analytic objective, and how the research team wants to discuss saturation when reporting study findings. That said, a researcher could, with this approach, run and report on saturation analyses of two or more codebooks that contain differing levels of coding granularity.

Tran and colleagues [ 24 ] accurately point out that determining the point of saturation is a difficult endeavor, because “researchers have information on only what they have found” (pg. 17). They further argue that the stopping point for an inductive study is typically determined by the “judgement and experience of researchers”. We acknowledge and agree with these assertions.

Selecting and interpreting levels of rigor, precision, and confidence is a subjective enterprise. What a quantitative researcher accepts, for example, as a large enough effect size or a small enough p-value is a subjective determination and based on convention in a particular field of study. The same can be said for how a researcher chooses to report and interpret statistical findings. P-values can be expressed either in absolute terms (e.g., p = .043) or in several commonly used increments (e.g., p < .05, p < .01, etc.). Likewise, while an odds ratio of 1.2 may be statistically significant, whether or not it’s meaningful in a real-world sense is entirely open to interpretation.

We are advocating for similar flexibility and transparency in assessing and reporting on thematic saturation. We have provided researchers with a method to easily calculate saturation during or after data collection. This method also enables researchers to select different levels of the constituent elements in the process–i.e., Base Size, Run Length and New Information Threshold–based on how confident they wish to be that their interpretations and conclusions are based on a dataset that reached thematic saturation. We hope researchers find this method useful, and that others build on our work by empirically testing the method on different types of datasets drawn from diverse study populations and contexts.

Supporting information

S1 datasets, acknowledgments.

We would like to thank Betsy Tolley for reviewing an earlier draft of this work and Alissa Bernholc for programming support.

Funding Statement

The authors received no specific funding for this work.

Data Availability

  • A/B Monadic Test
  • A/B Pre-Roll Test
  • Key Driver Analysis
  • Multiple Implicit
  • Penalty Reward
  • Price Sensitivity
  • Segmentation
  • Single Implicit
  • Category Exploration
  • Competitive Landscape
  • Consumer Segmentation
  • Innovation & Renovation
  • Product Portfolio
  • Marketing Creatives
  • Advertising
  • Shelf Optimization
  • Performance Monitoring
  • Better Brand Health Tracking
  • Ad Tracking
  • Trend Tracking
  • Satisfaction Tracking
  • AI Insights
  • Case Studies

quantilope is the Consumer Intelligence Platform for all end-to-end research needs

Data Saturation in Qualitative Research

mrx glossary data saturation

In this blog post, learn what data saturation is, how it relates to qualitative research practices, and how to leverage quantilope's video research solution: inColor. 

Table of Contents: 

What is data saturation in qualitative research.

  • Does data saturation matter? 
  • Data collection to reach data saturation 
  • Methodologies used to reach data saturation

qualitative data analysis with quantilope's inColor

Data saturation is the point in a research process where enough data has been collected to draw necessary conclusions, and any further data collection will not produce value-added insights. Data saturation is a term that originates from qualitative research’s grounded theory, a broad research method first coined in the 1960s by sociologists Glaser and Strauss.

Glaser and Strauss’ grounded theory describes the way in which social research reveals patterns in data that can then be used to generate theories and hypotheses on which further research can be done (this is in contrast to quantitative research, where pre-existing hypotheses form a framework for research).

In their 1967 book 'The Discovery of Grounded Theory,' Glaser and Strauss describe the concept of saturation like this:

 'The criterion for judging when to stop sampling the different groups pertinent to a category is the category’s theoretical saturation. Saturation means that no additional data are being found whereby the sociologist can develop properties of the [theoretical] category. As he sees similar instances over and over again, the researcher becomes empirically confident that a category is saturated.'

In other words, when the number of interviews, focus groups or other qualitative method is large enough, data analysis will start showing the same themes, with no new findings or variability, however thorough the analysis.

Back to Table of Contents

Does data saturation matter?

There has been a lot of debate and disagreement amongst social sciences professionals and researchers around the importance of data saturation. One reason for this is that qualitative research studies vary in their end goals; some projects will require exploring all possible avenues in great detail, while others are looking for much less exhaustive studies.

It's true that in any qualitative research study, the researcher wants to be sure that the project obtains the information it sets out to discover. For some studies, this might mean a broad research question pertaining to the topic - for example, 'what are the main concerns that people in the US have about the world today?'. It's a broad subject, and for a CPG business wondering how best to position its product, it might be enough to know that personal finances and climate change are pretty high up in importance and that the pricing and eco-credentials of its product need to be in line with needs and expectations relating to those concerns (competitive pricing within the category and recyclable packaging, for example.)

However, if a media company asks the same research question, the depth of detail required might be greater. Within the broad themes that emerge, precise and detailed sub-categories of concern under those themes might be required to tailor news and commentary to the interest of the audience.

Another point of contention is that data saturation focuses on the number of research interviews (aka, sample size ) rather than the quality of the data collection. A high number of depth interviews could be recruited and responses might start to repeat across the sample, but if the information extracted isn't rich enough then important insights can be missed. In an ideal study, a mix of both quantity and quality will be achieved.

Data collection to reach data saturation

qualitative data collection offers a highly flexible way to explore topics of interest. The value in qualitative research lies in how well qualitative inquiries are collected, with in-depth probing and steering of the conversation towards the most useful insights. This, as with sampling, comes with experience and good training in qualitative data collection techniques .

To feel confident any qualitative research outcome will provide adequacy for all pertinent insights to be unearthed, researchers need to ensure two things in their data collection:

  • Adequate sample size
  • Research subjects and quality of responses are interrogated thoroughly 

The sample size of a qualitative research study really depends on the research questions at hand, and the nature of the information sought. For example, do researchers need just a few simple soundbite citations to support their research initiatives, or do they need in-depth quotes from case studies with specific recalled experiences?

Following qualitative research fieldwork, the analysis of responses is key to what is known as inductive thematic saturation: when the emergence of new themes and new codes has plateaued. When you're conducting thematic analysis and you're starting to hear the same responses come up again and again with nothing new emerging, then you're probably at the point of saturation.

Methodologies used to reach saturation

Knowing what data saturation is, and best practices to keep in mind when collecting qualitative data, there are various methods a qualitative researcher can leverage during the study design process.

Qualitative research has traditionally been thought of as individual interviews but has expanded over time to include a whole host of other methodologies, including field methods, focus groups, video diaries, written diaries, ethnography observation exercises, and so on, all of which are valued for the unique angles they can deliver on a topic.

quantilope's qualitative research solution, inColor , offers an instinctive platform that brings you face-to-face with your target market for video qualitative interviews. 

Setting up qualitative studies with inColor puts you in charge of the number of participants you would like to include, with the option to add more participants as the study progresses. You can watch and listen to videos that participants create, which in itself brings data collection to life to get a good sense of views and reactions that are emerging. However, the analysis isn't just left to the researcher; multiple AI-driven analyses ensure that keywords, sentiments, and emotions are identified so that new themes and their relative importance are always uncovered - helping to easily identify any point of saturation. This results in truly insightful, conceptual, qualitative data that can be applied to your business immediately. 

If you'd like to know more about qualitative market research with quantilope and how you can be sure you've got all bases covered with your qual insights get in touch with us below: 

Get in touch to learn more!

Related posts, survey results: how to analyze data and report on findings, how florida's natural leveraged better brand health tracking, quirk's virtual event: fab-brew-lous collaboration: product innovations from the melitta group, brand value: what it is, how to build it, and how to measure it.

saturation in qualitative research definition

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

A simple method to assess and report thematic saturation in qualitative research

Roles Conceptualization, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

Affiliation Q42 Research, Research Triangle Park, North Carolina, United States of America

Roles Conceptualization, Data curation, Formal analysis, Methodology, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Global Health, Population, and Nutrition, FHI 360, Durham, North Carolina, United States of America

ORCID logo

  • Greg Guest, 
  • Emily Namey, 

PLOS

  • Published: May 5, 2020
  • https://doi.org/10.1371/journal.pone.0232076
  • Reader Comments

Table 1

Data saturation is the most commonly employed concept for estimating sample sizes in qualitative research. Over the past 20 years, scholars using both empirical research and mathematical/statistical models have made significant contributions to the question: How many qualitative interviews are enough? This body of work has advanced the evidence base for sample size estimation in qualitative inquiry during the design phase of a study, prior to data collection, but it does not provide qualitative researchers with a simple and reliable way to determine the adequacy of sample sizes during and/or after data collection. Using the principle of saturation as a foundation, we describe and validate a simple-to-apply method for assessing and reporting on saturation in the context of inductive thematic analyses. Following a review of the empirical research on data saturation and sample size estimation in qualitative research, we propose an alternative way to evaluate saturation that overcomes the shortcomings and challenges associated with existing methods identified in our review. Our approach includes three primary elements in its calculation and assessment: Base Size, Run Length, and New Information Threshold. We additionally propose a more flexible approach to reporting saturation. To validate our method, we use a bootstrapping technique on three existing thematically coded qualitative datasets generated from in-depth interviews. Results from this analysis indicate the method we propose to assess and report on saturation is feasible and congruent with findings from earlier studies.

Citation: Guest G, Namey E, Chen M (2020) A simple method to assess and report thematic saturation in qualitative research. PLoS ONE 15(5): e0232076. https://doi.org/10.1371/journal.pone.0232076

Editor: Andrew Soundy, University of Birmingham, UNITED KINGDOM

Received: January 4, 2020; Accepted: April 3, 2020; Published: May 5, 2020

Copyright: © 2020 Guest et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript's Supporting Information files.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Data saturation is the conceptual yardstick for estimating and assessing qualitative sample sizes. During the past two decades, scholars have conducted empirical research and developed mathematical/statistical models designed to estimate the likely number of qualitative interviews needed to reach saturation for a given study. Although this body of work has advanced the evidence base for sample size estimation during the design phase of a qualitative study, it does not provide a method to determine saturation, and the adequacy of sample sizes, during and/or after data collection. As Morse pointed out more than 20 years ago, “saturation is an important component of rigor. It is present in all qualitative research but, unfortunately, it is evident mainly by declaration” [ 1 ]. In this paper we present a method to assess and report on saturation that enables qualitative researchers to speak about--and provide some evidence for--saturation that goes beyond simple declaration.

To provide the foundation for this approach, we define saturation and then review the work to date on estimating saturation and sample sizes for in-depth interviews. We follow this with an overview of the few empirically-based methods that have been put forward to operationalize and measure saturation and identify challenges of applying these approaches to real-life research contexts, particularly those that use inductive thematic analyses. We subsequently propose an alternative way of evaluating saturation and offer a relatively easy-to-use method of assessing and reporting on it during or after an inductive thematic analysis. We test and validate our method using a bootstrapping technique on three distinctly different qualitative datasets.

The method we propose is designed for qualitative data collection techniques that aim to generate narratives–i.e., focus groups and one-on-one interviews that use open-ended questioning with inductive probing (though we have only attempted to validate the method on individual interview data). Our method also specifically applies to contexts in which an inductive thematic analysis [ 2 – 4 ] is used, where emergent themes are discovered in the data and then transformed into codes.

A brief history of saturation and qualitative sample size estimation

How many qualitative interviews are enough? Across academic disciplines, and for about the past five decades, the answer to this question has usually revolved around reaching saturation [ 1 , 5 – 9 ]. The concept of saturation was first introduced into the field of qualitative research as “theoretical saturation” by Glaser and Strauss in their 1967 book The Discovery of Grounded Theory [ 10 ]. They defined the term as the point at which “no additional data are being found whereby the [researcher] can develop properties of the category” (pg. 61). Their definition was specifically intended for the practice of building and testing theoretical models using qualitative data and refers to the point at which the theoretical model being developed stabilizes. Many qualitative data analyses, however, do not use the specific grounded theory method, but rather a more general inductive thematic analysis. Over time, the broader term “data saturation” has become increasingly adopted, to reflect a wider application of the term and concept. In this broader sense, saturation is often described as the point in data collection and analysis when new incoming data produces little or no new information to address the research question [ 4 , 9 , 11 – 13 ].

Interestingly, empirical research on saturation began with efforts to determine when one might expect it to be reached. Though “interviewing until saturation” was recognized as a best practice, it was not a sufficient description of sample size. In most research contexts, sample size specification and justification is required by funders, ethics committees, and other reviewers before a study is implemented [ 14 , 15 ]. Applied qualitative researchers faced the question: How do I estimate how many interviews I’ll need before I head into the field?

Empirical research to address this issue began appearing in the literature in the early 2000s. Morgan et al. [ 16 ] conducted a pioneer methodological study using data collected on environmental risks. They found that the first five to six interviews produced the majority of new information in the dataset, and that little new information was gained as the sample size approached 20 interviews. Across four datasets, approximately 80% to 92% of all concepts identified within the dataset were noted within the first 10 interviews. Similarly, Guest et al. [ 9 ] conducted a stepwise inductive thematic analysis of 60 in-depth interviews among female sex workers in West Africa and discovered that 70% of all 114 identified themes turned up in the first six interviews, and 92% were identified within the first 12 interviews. Subsequent studies by Francis et al. and Namey et al. [ 17 , 18 ] reported similar findings. Building on these earlier studies, Hagaman and Wutich [ 19 ] calculated saturation within a cross-cultural study and found that fewer than 16 interviews were enough to reach data saturation at each of the four sites but that 20–40 interviews were necessary to identify cross-cultural meta-themes across sites.

Using a meta-analytic approach, Galvin [ 20 ] reviewed and statistically analyzed—using binomial logic—54 qualitative studies. He found the probability of identifying a concept (theme) among a sample of six individuals is greater than 99% if that concept is shared among 55% of the larger study population. Employing this same logic, Fugard and Potts [ 21 ] developed a quantitative tool to estimate sample sizes needed for thematic analyses of qualitative data. Their calculation incorporates: (1) the estimated prevalence of a theme within the population, (2) the number of desired instances of that theme, and (3) the desired power for a study. Their tool estimates, for example, that to have 80% power to detect two instances of a theme with a 10% prevalence in a population, 29 participants would be required. Note that their model assumes a random sample.

The above studies are foundational in the field of qualitative sample size estimation. They provide empirically-based guidance for approximating how many qualitative interviews might be needed for a given study and serve a role analogous to power calculations in quantitative research design (albeit in some case without the math and degree of precision). And, like power calculations, they are moot once data collection begins. Estimates are based on (specified) assumptions, and expectations regarding various elements in a particular study. As all researchers know, reality often presents surprises. Though a study may be powered to certain parameters (quantitative) or have a sample size based on empirical guidance (qualitative), after data collection is completed the resulting data may not conform to either.

Not surprisingly, researchers have recently begun asking two follow up questions about data saturation that go beyond estimation: How can we better operationalize the concept of saturation ? and How do we know if we have reached saturation ?

Operationalizing and assessing saturation

The range of empirical work on saturation in qualitative research and detail on the operationalization and assessment metrics used in data-driven studies that address saturation are summarized in Table 1 . In reviewing these studies to inform the development of our approach to assessing saturation, we identified three limitations to the broad application of saturation assessment processes which we sought to overcome: lack of comparability of metrics, reliance on probability theory or random sampling, and retrospective assessment dependent on having a fully coded/analyzed dataset. We discuss each limitation briefly before introducing our alternative approach.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0232076.t001

Lack of comparability in metrics.

Current operationalizations of saturation vary widely in the criteria used to arrive at a binary determination of saturation having been reached or not reached (e.g., Francis et al. [ 17 ] and Coenen et al. [ 22 ]). Given how different approaches are–in terms of units of analysis and strictness of saturation thresholds–it is difficult to understand how much confidence to have in a conclusion about whether saturation was reached or not. Unlike quantitative researchers using statistical analysis methods who have established options for levels of confidence intervals and other metrics to report, there are no agreed-upon metrics to help qualitative researchers interpret the strength of their saturation findings. The method we propose facilitates qualitative researchers’ choice among levels of assessment criteria along with a common description of those criteria that will allow readers to interpret conclusions regarding saturation with more or less confidence, depending on the strictness of the criteria used.

Reliance on probability theory, and/or the assumption of a random sample.

Basing assessments of saturation on probabilistic assumptions (e.g., Lowe et al. [ 26 ], Fugard & Potts [ 21 ], Galvin [ 20 ]) ignores the fact that most qualitative research employs non-probabilistic, purposive sampling suited to the nature and objectives of qualitative inquiry [ 28 ]. Even in cases where random sampling is employed, the open-ended nature of qualitative inquiry doesn’t lend itself well to probability theory or statistical inference to a larger population because response categories are not structured, so are not mutually exclusive. The expression of Theme A is not necessarily to the exclusion of Theme B, nor does the absence of the expression of Theme A necessarily indicate Not-A. Further, from a logistical standpoint, many qualitative researchers do not have the expertise, nor the time required, to perform complicated statistical tests on their datasets. Our approach involves only simple arithmetic and calculation of percentages.

Retrospective assessment dependent on having a fully coded/analyzed dataset.

Methods that calculate saturation based on the proportion of new themes relative to the overall number of themes in a dataset (e.g., Guest et al. [ 9 ], Hennink et al. [ 23 ]) are limited by the total number of interviews conducted: the denominator represents the total number of themes in the fully-analyzed dataset and is fixed, while the number of themes in the numerator gets closer to the denominator with every new interview considered, thus eventually reaching 100% saturation. Saturation will inevitably occur in a retrospectively-assessed, fully-analyzed, fixed-size dataset. The method we outline eliminates this problem by using a subset of data items in the denominator instead of the entire dataset, facilitating better prospective assessment of saturation and offering the advantage of allowing researchers to stop before reaching a pre-specified number of interviews. (Under our approach, however, a measure of percent saturation as defined by these authors will not be available.)

An alternative approach and method to calculating and reporting saturation

For the purposes of our assessment, saturation refers to the point during data analysis at which incoming data points (interviews) produce little or no new useful information relative to the study objectives. Our approach to operationalizing this definition of saturation consists of three distinct elements–the base size , the run length , and the relative amount of incoming new information, or the new information threshold .

When assessing saturation, incoming information is weighed against the information already obtained. Base size refers to how we circumscribe the body of information already identified in a dataset to subsequently use as a denominator (similar to Francis et al.’s initial analysis sample). In other words, what is the minimum number of data collection events (i.e., interviews) we should review/analyze to calculate the amount of information already gained ? We know that if we use all of the data collection events as our base size, we can reach saturation by default as there are no more data to consider. We also know from previous studies [ 9 , 16 , 29 ] that most novel information in a qualitative dataset is generated early in the process, and generally follows an asymptotic curve, with a relatively sharp decline in new information occurring after just a small number of data collection/analysis events. For this reason, we have chosen to test 4, 5, and 6 interviews as base sizes from which to calculate the total number of unique themes to be used in the denominator of the saturation ratio. The unit of analysis for base size is the data collection event; the items of analysis are unique codes representing themes.

Run length.

A run can be defined as a set of consecutive events or observations, in this case interviews. The run length is the number of interviews within which we look for, and calculate, new information . The number of new themes found in the run defines the numerator in the saturation ratio. Hagaman and Wutich (2017) and Francis et al. (2010), for example, consider runs of three data collection events each time they (re)assess the number of new themes for the numerator, whereas Coenen et al. (2012) include only two events in their data runs. For our analyses we provide both options for run lengths in our calculations–two events and three events–to afford researchers more flexibility. Note that in our analyses, successive runs overlap: each set of interviews shifts to the right or “forward” in time by one event. Fig 1 shows the process, and how base size and run length relate to one another. Here again the unit of analysis is the data collection event; the items of analysis are unique codes.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g001

New information threshold.

Once units of analysis for the numerator and denominator are determined the proportional calculation is simple. But the next question is a purely subjective one: What level of paucity of new information should we accept as indicative of saturation? We propose that furnishing researchers with options—rather than a prescriptive threshold—is a more realistic, transparent and accurate practice. We therefore propose initially two levels of new information that represent the proportion of new information we would accept as evidence that saturation has been reached at a given point in data collection: ≤5% new information and no (0%) new information.

These new information thresholds can be used as benchmarks similar to how a p-value of <0.05 or <0.01 is used to determine whether enough evidence exists to reject a null hypothesis in statistical analysis. As in statistical analysis—but absent the probability theory—there is no guarantee that saturation is in fact reached when meeting these thresholds. But they do provide a transparent way of presenting data saturation assessments that can be subsequently interpreted by other researchers. The lower the new information threshold, the less likely an important number of themes may remain undiscovered in later interviews if data collection stops when the threshold is reached. Taken together, the concepts of base size, run length, and new information threshold allow researchers to choose how stringently they wish to apply the saturation concept–and the level of confidence they might have that data saturation was attained for a given sample ( Fig 2 ).

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g002

The advantages of the method we propose are several:

  • It does not assume or require a random sample, nor prior knowledge of theme prevalence.
  • Calculation is simple. It can be done quickly and with no statistical expertise.
  • Metrics can be used prospectively during the data collection and analysis process to ascertain when saturation is reached (and providing the possibility of conducting fewer data collection events than planned).
  • Metrics can be used retrospectively , after data collection and analysis are complete, to report on the adequacy of the sample to reach thematic saturation.
  • Options for each metric can be specified prior to analysis or reported after data analysis.
  • The metrics are flexible. Researchers have options for how they describe saturation and can also use the term with more transparency and precision.
  • Saturation is conceptualized as a relative measure. This neutralizes differences in the level of coding granularity among researchers, as the method affects both numerator and denominator.

Application of the approach

An example of prospective data saturation calculation..

Let’s consider a step-by-step example of how this process works, using a hypothetical dataset to illustrate the approach. We will prospectively calculate saturation using a base size of 4 interviews and run length of 2 interviews. For this example, we have selected a new information threshold of ≤ 5% to indicate that we have reached adequate saturation. [The data used for each step are included in Fig 3 , along with indication of the base, runs, and saturation points.]

thumbnail

https://doi.org/10.1371/journal.pone.0232076.g003

STEP 1 –Find the number of unique themes for base.

We start by looking at the first four interviews conducted and summing the number of unique themes identified within this group. The resulting sum, 37, is the denominator in our equation.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t002

STEP 2—Find the number of unique themes for the first run.

In this example, we’re using a run length of two, so include data for the next two interviews after the base set–i.e., interviews 5 and 6. After reviewing those interviews, let’s say we identified four new themes in interview 5 and three new themes in interview 6. The number of new themes in this first run is seven.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t003

STEP 3 –Calculate the saturation ratio.

Divide the number of new themes in this run (seven) by the number of unique themes in the base set (37). The quotient reveals 19% new information. This is not below our ≤5% threshold, so we continue.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t004

STEP 4 –Find the number of new unique themes for the next run in the series.

For the next run we add the new themes for the next two interviews, 6 and 7 (note the overlap of interview 6), resulting in a sum of four.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t005

STEP 5—Update saturation ratio.

Take the number of new themes in the latest run (four) and divide by the number of themes in the base set (37). This renders a quotient of 11%, still not below our ≤5% threshold. We continue to the next run.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t006

STEP 6 –Find the number of new unique themes for the next run in the series.

For this third run we add the number of new themes identified within interviews 7 and 8.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t007

STEP 7—Update saturation ratio.

Take the number of new themes in the latest run (one) divided by the number of themes in the base set (37).

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t008

At this point the proportion of new information added by the last run is below the ≤5% threshold we established, so we stop here after the 8 th interview and have a good sense that the amount of new information is diminishing to a level where we could say saturation has been reached based on our subjective metric of ≤5%. Since the last two interviews did not add substantially to the body of information collected, we would say that saturation was reached at interview 6 (each of the next two interviews were completed to see how much new information would be generated and whether this would fall below the set threshold). We would annotate these two extra interviews (indicative of run length) by appending a superscript “+2” to the interview number, to indicate a total of eight interviews were completed. In writing up our saturation assessment then, we would say that using a base size 4 we reached the ≤5% new information threshold at 6 +2 interviews.

If we wanted to be more conservative, and confident in our conclusion of reaching saturation in this example, we could adjust two parameters of our assessment. We could increase the run length to 3 (or an even larger number), and/or we could set a more stringent new information threshold of no new information. If we consider the hypothetical data set used here (see Fig 3 ) and kept the run length of 2, the 0% new information threshold would have been reached at interview 10 +2 .

One may still raise two logical questions after reviewing the example process above. The first is “How do we know that we’re not missing important information by capping our sample at n when saturation is indicated?” Put another way, if we had conducted, say, five more interviews would we have gotten additional and important data? The honest answer to this is that we don’t know, and we can never know unless we conduct those five extra interviews, and then five more after that and so on. That is where we rely on the empirical research that shows the rate at which new information emerges decreases over time and that the most common and salient themes are generated early, assuming that we keep the interview questions, sample characteristics, and other study parameters relatively consistent. To further illustrate how saturation may have been affected by doing additional interviews, we include 20 interviews in Fig 3 . The interviews following Interview 12, though yielding four additional themes, remained at or below the ≤5% new information threshold.

The second question is to a degree related to the first question and pertains to possible order effects. Would the theme identification pattern in a dataset of 20 interviews look the same if interviews #10 through #20 were conducted first? Could new themes start emerging later in the data collection process? Though it is possible an important theme will emerge later in the process/dataset, the empirical studies referenced above demonstrate that the most prevalent, high-level, themes are identified very early on in data collection, within about six interviews. But, to further check this, we use a bootstrapping technique on three actual datasets to corroborate findings from these earlier studies and to assess the distributional properties of our proposed metrics. These bootstrap findings give us information on how saturation may be reached at different stopping points as new themes are discovered in new interviews and when the interviews are ordered randomly in different replications of the sample of interviews.

Sample datasets.

We selected three existing qualitative datasets to which we applied the bootstrapping method. Although the datasets were all generated from individual interviews analyzed using an inductive thematic analysis approach, the studies from which they were drawn differed with respect to study population, topics of inquiry, sample heterogeneity, interviewer, and structure of data collection instrument, as described below.

Dataset 1 . This study included 40 individual interviews with African American men in the Southeast US about their health seeking behaviors [ 29 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. The inductive thematic analysis included 11 of the 13 questions and generated 93 unique codes. The study sample was highly homogenous.

Dataset 2 . The second dataset consists of 48 individual interviews conducted with (mostly white) mothers in the Southeast US about medical risk and research during pregnancy [ 30 ]. The interview guide contained 13 main questions, each with scripted sub-questions. Inductive probing was employed throughout all interviews. Of note, the 48 interviews were conducted, 12 each, using different modes of data collection: in-person, by video (Skype-like platform), email (asynchronous), or text chat (synchronous). The qualitative thematic analysis included 10 of these questions and generated 85 unique codes.

Dataset 3 . This study included 60 interviews with women at higher risk of HIV acquisition—30 participants in Kenya and 30 in South Africa [ 31 ]. The interview was a follow-up qualitative inquiry into women’s responses on a quantitative survey. Though there were 14 questions on the guide, only data from three questions were included in the thematic analysis referenced here. Those three questions generated 55 codes. Participants from the two sites were similar demographically with the exceptions of education and marital status. Substantially more women from the Kenya sample were married and living with their partners (63% versus 3%) and were less likely to have completed at least some secondary education. All interviews were conducted in a local language.

Data from all three studies were digitally recorded and transcribed using a transcription protocol [ 32 ]; transcripts were translated to English for Dataset 3. Transcripts were imported into NVivo [ 33 ] to facilitate coding and analysis. All three datasets were analyzed using a systematic inductive thematic approach [ 2 ], and all codes were explicitly defined in a codebook following a standard template [ 34 ]. For Datasets 1 & 2, two analysts coded each transcript independently and compared code application after each transcript. Discrepancies in code application were resolved through discussion, resulting in consensus-coded documents. For Dataset 3, two coders conducted this type of inter-coder reliability assessment on 20% of the interviews (a standard, more efficient approach than double-coding all interviews [ 2 ]). All three studies were reviewed and approved by the FHI 360 Protection of Human Subjects Committee; the study which produced Dataset 3 was also reviewed and approved by local IRBs in Kenya and South Africa.

Bootstrapping method.

While these three studies offer diverse and analytically rigorous case studies, they provide limited generalizability. To approximate population-level statistics and broaden our validation exercise, we drew empirical bootstrap samples from each of the datasets described above. The bootstrap method is a resampling technique that uses the variability within a sample to estimate the sampling distribution of metrics (in this case saturation metrics) empirically [ 35 ]. This is done by randomly resampling from the sample with replacement (i.e., an item may be selected more than once in a resample) many times in a way that mimics the original sampling scheme. For each qualitative dataset, we generated 10,000 resamples from the original sample. In addition, we randomly ordered the selected transcripts in each resample to offset any order effect on how/when new codes are discovered. For each resample, we calculated the proportion of new themes found in run lengths of two or three new events relative to a base size of four, five or six interviews. We then identified the number of transcripts needed to meet a new information threshold of ≤5% or 0%. Based on these thresholds from 10,000 resamples, for each dataset we computed the median and the 5th and 95th percentiles for number of interviews required to reach each new information threshold across different base sizes and run lengths. The 5th and 95th percentiles provide a nonparametric 90% confidence interval for the number of transcripts needed to reach saturation as defined at these new information thresholds.

Since we had available the total number of codes identified in each dataset, we carried out one additional calculation as a way to provide another metric to understand how the median number of interviews to reach a new information threshold related to retrospectively-assessed degrees of saturation with the entire dataset. In this case, once the number of interviews to reach a new information threshold was determined for each run of a dataset, we divided the number of unique themes identified up to that point by the total number of unique themes. This provided a percent–or degree–of saturation for each run of the data, which was then used to generate a median and 5 th and 95 th percentile for the degree of saturation reached. This can then be compared across base sizes, run lengths, and new information thresholds. [Note that we include this as a further way to understand and validate the proposed approach for calculating saturation, rather than as part of the proposed process.]

The results from the bootstrapping analyses are presented by dataset, in Tables 2 , 3 and 4 . Each table presents median and percentiles of the bootstrap distribution using bases of 4, 5 or 6 and run lengths of 2 and 3, at new information thresholds of ≤5% and no new information.

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t009

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t010

thumbnail

https://doi.org/10.1371/journal.pone.0232076.t011

Note that, as described in the example above, the number of interviews in the run length is not included in the number of interviews to reach the given new information threshold, so the total number of events needed to assess having reached the threshold is two or three more interviews than the given median, depending on the run length of choice. This is indicated by a superscript +2 or +3.

For Dataset 1 ( Table 2 ), at the ≤5% new information threshold, the median number of interviews needed to reach a drop-off in new information was consistent across all base sizes. At a run length of two interviews, the median number of interviews required before a drop in new information was observed was six. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 7 and 8 was less than or equal to 5% of the total. At a run length of three interviews, the median number of interviews required before a drop in new information was observed was seven. This means that relative to the total number of unique codes identified in the first four, five, or six interviews, the amount of new information contributed by interviews 8, 9, and 10 was less than or equal to 5% of the total. Across base sizes, for a run length of two, we would say that saturation was indicated at 6 +2 , while for a run length of three we would say saturation was observed at 7 +3 , both at the ≤5% new information level. Using the total number of themes in the dataset retrospectively, the number of themes evident across 6–7 interviews corresponded with a median degree of saturation of 78% to 82%.

At the 0% new information threshold, the median number of interviews to indicate saturation were again consistent across bases sizes, varying only by the run length. The median number of interviews required were 11 +2 and 14 +3 . In other words, at run length 2, it took 11 interviews, plus two more to confirm that no new information was contributed. At run length 3 it was 14 interviews plus three more to confirm no new information. The number of themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

The results for Dataset 2 were nearly identical to Dataset 1 ( Table 3 ). Saturation was indicated at 6 interviews at a run length of 2 (6 +2 ) and 7–8 interviews at run length 3 (7 +3 or 8 +3 ). The number of themes evident across 6–8 interviews corresponded with a median degree of saturation of 79% to 82%. At the 0% new information threshold saturation was indicated at the same points as in Dataset 1: 11 +2 and 14 +3 , consistent across all base sizes. In other words, no new information was observed after a median of 11 interviews using a run-length of 2, nor after 14 interviews using a run length of 3. Here again, despite a different total number of themes in the overall dataset, the number of new themes evident across 11–14 interviews corresponded with a median degree of saturation of 87% to 89%.

Dataset 3 ( Table 4 ) contained more variation in the sample than the others, which was reflected in a slightly higher median number of interviews and a lower degree of saturation. At the ≤5% new information threshold, the median number of interviews required to reach saturation at a run length of 2 was 8–9 (higher for base size 4). At a run length of 3, the median number of required interviews was 11–12 (again higher for base size 4). The number of new themes evident across 8–12 interviews corresponded with a median degree of saturation of 62% to 71%. At the 0% new information threshold, saturation was indicated at 12 +2 and 16 +3 , consistent across base sizes. The number of new themes evident across 12–16 interviews corresponded with a median degree of saturation of 69% to 76%.

In this paper we present a way of assessing thematic saturation in inductive analysis of qualitative interviews. We describe how this method circumvents many of the limitations associated with other ways of conceptualizing, assessing and reporting on saturation within an in-depth interview context. The process can be applied either prospectively during the data collection and analysis process or retrospectively , after data collection and analysis are complete. A key advantage is that the metrics are flexible, affording researchers the ability to choose different degrees of rigor by selecting different run lengths and/or new information thresholds. Similarly, the method allows for different options–and greater clarity and transparency–in describing and reporting on saturation.

Based on the bootstrapping analyses we can draw several conclusions. The first is that the results are within the range of what we would have expected based on previous empirical studies. Using the ≤5% new information threshold, our findings indicate that typically 6–7 interviews will capture the majority of themes in a homogenous sample (6 interviews to reach 80% saturation). Our analyses also show that at the higher end of the range for this option (95 th %ile) 11–12 interviews might be needed, tracking with existing literature indicating 12 interviews are typically needed to reach higher degrees of saturation.

We can also draw other lessons to inform application of this process:

  • Base size appears to have almost no effect on the outcome. This is important from an efficiency perspective. If our findings hold true in other contexts, it suggests that using a default base size of four interviews is sufficient. In practical terms, this implies that saturation should initially be assessed after six interviews (four in the base, and two in the run). If analyzing data in real time, the results of this initial assessment can then determine whether or not more interviews are needed.
  • Run length has an effect on the outcome, as one would expect. The longer the run length, the greater number of interviews required to reach saturation. The size of run length effect is smallest–very minimal–if employing the ≤5% new information threshold. The practical implication of this finding is that researchers can choose a longer run length–e.g., three interviews (or more)–to generate a more conservative assessment of saturation.
  • The new information threshold selected affects the point at which saturation is indicated, as one would expect. The lower the new information threshold–and therefore the more conservative the allowance for recognizing new information–the more interviews are needed to achieve saturation. From an applied standpoint this finding is important in that researchers can feel confident that choosing a more stringent new information threshold–e.g., 0%—will result in a more conservative assessment of saturation, if so desired.

There are, of course, still limitations to this approach. It was developed with applied inductive thematic analyses in mind–those for which the research is designed to answer a relatively narrow question about a specific real-world issue or problem–and the datasets used in the bootstrapping analyses were generated and analyzed within this framework. The applicability of this approach for qualitative research with a different epistemological or phenomenological perspective is yet untested. Another potential limitation of this method relates to codebook structure. When conducting an inductive thematic analysis, researchers must decide on an appropriate codebook organizational scheme (see Hennink et al. [ 23 ] for discussion on this as it relates to saturation). We tested our method on single-tier codebooks, but qualitative researchers often create hierarchical codebooks. A two-tier structure with primary (“parent”) codes and constituent secondary (“child”) codes is a common form, but researchers may also want to identify and look for higher-level, meta-themes (e.g., Hagaman and Wutich [ 19 ]). For any method of assessing saturation, including ours, researchers need to decide at which level they will identify and include themes/codes. For inductive thematic analyses this is a subjective decision that depends on the degree of coding granularity necessary for a particular analytic objective, and how the research team wants to discuss saturation when reporting study findings. That said, a researcher could, with this approach, run and report on saturation analyses of two or more codebooks that contain differing levels of coding granularity.

Tran and colleagues [ 24 ] accurately point out that determining the point of saturation is a difficult endeavor, because “researchers have information on only what they have found” (pg. 17). They further argue that the stopping point for an inductive study is typically determined by the “judgement and experience of researchers”. We acknowledge and agree with these assertions.

Selecting and interpreting levels of rigor, precision, and confidence is a subjective enterprise. What a quantitative researcher accepts, for example, as a large enough effect size or a small enough p-value is a subjective determination and based on convention in a particular field of study. The same can be said for how a researcher chooses to report and interpret statistical findings. P-values can be expressed either in absolute terms (e.g., p = .043) or in several commonly used increments (e.g., p < .05, p < .01, etc.). Likewise, while an odds ratio of 1.2 may be statistically significant, whether or not it’s meaningful in a real-world sense is entirely open to interpretation.

We are advocating for similar flexibility and transparency in assessing and reporting on thematic saturation. We have provided researchers with a method to easily calculate saturation during or after data collection. This method also enables researchers to select different levels of the constituent elements in the process–i.e., Base Size, Run Length and New Information Threshold–based on how confident they wish to be that their interpretations and conclusions are based on a dataset that reached thematic saturation. We hope researchers find this method useful, and that others build on our work by empirically testing the method on different types of datasets drawn from diverse study populations and contexts.

Supporting information

S1 datasets. datasets used in bootstrapping analyses..

https://doi.org/10.1371/journal.pone.0232076.s001

Acknowledgments

We would like to thank Betsy Tolley for reviewing an earlier draft of this work and Alissa Bernholc for programming support.

  • View Article
  • Google Scholar
  • 2. Guest G, MacQueen K, Namey E. Applied Thematic Analysis. Thousand Oaks, CA: Sage; 2012.
  • 3. Miles MB, Huberman A.M., Saldana J. Qualitative Data Analysis: A Methods Sourcebook. 3 ed. Thousand Oaks, CA: Sage; 2014.
  • 4. Bernard HR, & Ryan G. W. Analyzing qualitative data: Systematic approaches. Thousand Oaks, CA: Sage; 2010.
  • PubMed/NCBI
  • 10. Glaser B, Strauss A. The Discovery of Grounded Theory: Strategies for Qualitative Research. New York, NY: Aldine; 1967 1967.
  • 11. Given LM. 100 Questions (and Answer) about Qualitative Research. Thousand Oaks, CA: Sage; 2016.
  • 12. Birks M, Mills J. Grounded Theory: A Practical Guide. 2 ed. London: Sage; 2015.
  • 13. Olshansky EF. Generating theory using grounded theory methodology. In: de Chesnay M, editor. Nursing Research Using Grounded Theory: Qualitative Designs and Methods in Nursing. New York Springer; 2015. p. 19–28.
  • 14. Cheek J. An untold story: doing funded qualitative research. In: Denzin N, Lincoln Y, editors. Handbook for Qualitative Research. Thousand Oaks, CA: Sage Publications; 2000. p. 401–20.
  • 15. Charmaz K. Constructing Grounded Theory, 2nd ed. Thousand Oaks, CA: Sage; 2014.
  • 16. Morgan M, Fischoff B, Bostrom A, Atman C. Risk Communication: A Mental Models Approach. New York, NY: Cambridge University Press; 2002.
  • 28. Patton M. Qualitative research & evaluation methods: integrating theory and practice. 4th ed. Thousand Oaks, CA: Sage; 2015.
  • 33. QSR. NVivo qualitative data analysis software, version 10. 2012.
  • 34. MacQueen K, McLellan-Lemal E, Bartholow K, Milstein B. Team-based codebook development: structure, process, and agreement. In: Guest G, MacQueen K, editors. Handbook for Team-based Qualitative Research. Lanham, MD: AltaMira Press; 2008. p. 119–36.
  • 35. Lavrakas PJ, editor. Encyclopedia of Survey Research Methods. Thousand Oaks, California2008.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Market Research
  • Artificial Intelligence
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Data Saturation In Qualitative Research

Try Qualtrics for free

What is data saturation in qualitative research.

8 min read A crucial milestone in qualitative research, data saturation means you can end the data collection phase and move on to your analysis. Here we explain exactly what it means, the telltale signs that you’ve reached it, and how to get there as efficiently as possible.

Author:  Will Webster

Subject Matter Expert:  Jess Oliveros

Data saturation is a point in data collection when new information no longer brings fresh insights to the research questions.

Reaching data saturation means you’ve collected enough data to confidently understand the patterns and themes within the dataset – you’ve got what you need to draw conclusions and make your points. Think of it like a conversation where everything that can be said has been said, and now it’s just repetition.

Why is data saturation most relevant to qualitative research? Because qualitative research is about understanding something deeply, and you can reach a critical mass when trying to do that. Quantitative research, on the other hand, deals in numbers and with predetermined sample sizes , so the concept of data saturation is less relevant.

Free eBook: Qualitative research design handbook

How to know when data saturation is reached

At the point of data saturation, you start to notice that the information you’re collecting is just reinforcing what you already know rather than providing new insights.

Knowing when you’ve reached this point is fairly subjective – there’s no formula or equation that can be applied. But there are some telltale signs that can apply to any qualitative research project .

When one or multiple of these signs are present, it’s a good time to begin finalizing the data collection phase and move on to a more detailed analysis.

Recurring themes

You start to notice that new data doesn’t bring up new themes or ideas. Instead, it echoes what you’ve already recorded.

This is a sign that you’ve likely tapped into all the main ideas related to your research question.

No new data

When interviews or surveys start to feel like you’re reading from the same script with each participant, you’ve probably reached the limit of diversity in responses. New participants will probably only confirm what you already know.

You’ve collected enough instances and evidence for each category of your analysis that you can support each theme with multiple examples. In other words, your data has become saturated with a depth and richness that illustrates each finding.

Full understanding

You reach a level of familiarity with the subject matter that allows you to accurately predict what your participants will say next. If this is the case, you’ve likely reached data saturation.

Consistency

The data starts to show consistent patterns that support a coherent story. Crucially, inconsistencies and outliers don’t challenge your thinking and significantly alter the narrative you’ve formed.

This consistency across the data set strengthens the validity of your findings.

Is data saturation the goal of qualitative research?

In a word, no. But it’s often a critical milestone.

The true goal of qualitative research is to gain a deep understanding of the subject matter; data saturation indicates that you’ve gathered enough information to achieve that understanding.

That said, working to achieve data saturation in the most efficient way possible should be a goal of your research project.

How can a qualitative research project reach data saturation?

Reaching data saturation is a pivotal point in qualitative research as a sign that you’ve generated comprehensive and reliable findings.

There’s no exact science for reaching this point, but it does consistently demand two things: an adequate sample size and well-screened participants.

Adequate sample size

Achieving data saturation in qualitative research heavily relies on determining an appropriate sample size .

This is less about hitting a specific number and more about ensuring that the range of participants is broad enough to capture the diverse perspectives your research needs – while being focused enough to allow for thorough analysis.

Flexibility is crucial in this process. For example, in a study exploring patient experiences in a hospital, starting with a small group of patients from various departments might be the initial plan. However, as the interviews progress, if new themes continue to emerge, it might indicate the need to broaden the sample size to include more patients or even healthcare providers for a more comprehensive understanding.

An iterative approach like this can help your research to capture the complexity of people’s experiences without overwhelming the research with redundant information. The goal is to reach a point where additional interviews yield little new information, signaling that the range of experiences has been adequately captured.

While yes, it’s important to stay flexible and iterate as you go, it’s always wise to make use of research solutions that can make recommendations on suggested sample size . Such tools can also monitor crucial metrics like completion rate and audience size to keep your research project on track to reach data saturation.

Well-screened participants

In qualitative research, the depth and validity of your findings are of course totally influenced by your participants. This is where the importance of well-screened participants becomes very clear.

In any research project that addresses a complex social issue – from public health strategy to educational reform – having participants who can provide a range of lived experiences and viewpoints is crucial. Generating the best result isn’t about finding a random assortment of individuals, but instead about forming a carefully selected research panel whose experiences and perspectives directly relate to the research questions.

Achieving this means looking beyond surface criteria, like age or occupation, and instead delving into qualities that are relevant to the study, like experiences, attitudes or behaviors. This ensures that the data collected is rich and deeply rooted in real-world contexts, and will ultimately set you on a faster route to data saturation.

At the same time, if you find that your participants aren’t providing the depth or range of insights expected, you probably need to reevaluate your screening criteria. It’s unlikely that you’ll get it right first time – as with determining sample size, don’t be afraid of an iterative process.

To expedite this process, researchers can use digital tools to build ever-richer pictures of respondents , driving more targeted research and more tailored interactions.

Elevate your qualitative research skills

Mastering qualitative research involves more than knowing concepts like data saturation – it’s about grasping the entire research journey. To do this, you need to dive deep into the world of qualitative research where understanding the ‘why’ behind the ‘what’ is key.

‘Qualitative research design handbook’ is your guide through this journey.

It covers everything from the essence of qualitative analysis to the intricacies of survey design and data collection. You’ll learn how to apply qualitative techniques effectively, ensuring your research is both rich and insightful.

  • Uncover the secrets of qualitative analysis
  • Design surveys that get to the heart of the matter
  • Learn strategic data collection
  • Master effective application of techniques

Get your hands on this invaluable resource to refine your research skills. Download our eBook now and step up your qualitative research game.

Related resources

Analysis & Reporting

Thematic Analysis 11 min read

Behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, social media analytics 13 min read, kano analysis 21 min read, request demo.

Ready to learn more about Qualtrics?

Sample sizes for saturation in qualitative research: A systematic review of empirical tests

Affiliations.

  • 1 Hubert Department of Global Health, Rollins School of Public Health, Emory University, 1518 Clifton Rd, Atlanta, GA, 30322, USA. Electronic address: [email protected].
  • 2 Department of Anthropology and Global Health Program, University of California San Diego, 9500 Gilman Drive 0532, La Jolla, CA, 92093, USA. Electronic address: [email protected].
  • PMID: 34785096
  • DOI: 10.1016/j.socscimed.2021.114523

Objective: To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and guidance we can draw from these studies.

Methods: We conducted a systematic review of four databases to identify studies empirically assessing sample sizes for saturation in qualitative research, supplemented by searching citing articles and reference lists.

Results: We identified 23 articles that used empirical data (n = 17) or statistical modeling (n = 6) to assess saturation. Studies using empirical data reached saturation within a narrow range of interviews (9-17) or focus group discussions (4-8), particularly those with relatively homogenous study populations and narrowly defined objectives. Most studies had a relatively homogenous study population and assessed code saturation; the few outliers (e.g., multi-country research, meta-themes, "code meaning" saturation) needed larger samples for saturation.

Conclusions: Despite varied research topics and approaches to assessing saturation, studies converged on a relatively consistent sample size for saturation for commonly used qualitative research methods. However, these findings apply to certain types of studies (e.g., those with homogenous study populations). These results provide strong empirical guidance on effective sample sizes for qualitative research, which can be used in conjunction with the characteristics of individual studies to estimate an appropriate sample size prior to data collection. This synthesis also provides an important resource for researchers, academic journals, journal reviewers, ethical review boards, and funding agencies to facilitate greater transparency in justifying and reporting sample sizes in qualitative research. Future empirical research is needed to explore how various parameters affect sample sizes for saturation.

Keywords: Focus group discussions; Interviews; Qualitative research; Sample size; Saturation.

Copyright © 2021. Published by Elsevier Ltd.

Publication types

  • Systematic Review
  • Data Collection
  • Focus Groups
  • Qualitative Research
  • Research Design*
  • Sample Size

InterQ Research

What is Data Saturation in Qualitative Research?

Data Saturation Qual Research

  • February 2, 2022

Article Summary:  Data saturation is a principle in qualitative research whereby you can have significantly smaller sample sizes than in quantitative research. This is because of “data saturation” which means you start to hear the same themes repeatedly in a research interview.

Whether you’re in academia or in business, and working in qualitative research , you may have come across or be familiar with the term “ saturation ” in qualitative research. It’s an important principle, and actually one of the defining characteristics of qualitative research – since qualitative research deals with small sample sizes – so we want to dedicate a post to saturation in qualitative research.

First, let’s discuss sample sizes in qualitative research

Before we dive into saturation in qualitative research, we first need to define qualitative research, and specifically, discuss sample sizes in qualitative research. Unlike quantitative research, which is rooted in statistical analysis and seeks to analyze “how many” or patterns in data, qualitative research focuses on themes. Qualitative data is collected through interviewing, observation, and sometimes task completion. The most common methodologies used in market and UX research are in-depth interviews, focus groups , ideation groups, dyads, triads , ethnographies, and social listening studies.

A key component of qualitative research is smaller sample sizes that are homogenous in nature. This means that instead of interviewing a population with a wide-array of characteristics, qualitative research first focuses on segmenting audiences into similar psychographic qualities (often called “personas.”) This ensures that the research study is aimed at exploring themes or ideas from a specific subset of a population. For example, a segment might be “small business owners who use Brand X to order supplies for their business.” The idea is to ensure that the segments have well-defined characteristics, which are screened during the recruiting process.

Since there will be highly defined segments, qualitative researchers focus on speaking to a defined number of participants in order to explore themes.

How many participants should a qualitative study have?

We’ve written on this topic of sample sizes before, so we won’t spend this post on that, but typically (again, for homogenous populations), between 10-20 total participants, per segment, is a solid number. Really, what will be the cutoff is when you hit saturation in your research, so let’s focus on what saturation is.

Saturation in qualitative research is when, through the course of interviewing (or observation), you notice the same themes coming out, repeatedly. As you interview more and more participants, you stop finding new themes, ideas, opinions, or patterns. Essentially, saturation is when you get diminishing return, despite talking to more and more people.

How soon will you hit saturation? That depends, of course. For highly homogenous samples (very niche industries/job roles) for examples, saturation can happen after as little as 5 interviews. If you have a more diverse population sample (teenagers who use a particular social media app for 20+ hours a week, for example), you may need to interview 30 or more people before you hit saturation of themes.

Saturation can also depend on the specificity of the study. For example, if you are doing a UX study and asking people to test out an app, you may find pretty quickly (after maybe 4-5 interviews) that everyone is having the same reaction or moving through the product in the same way. You’re studying a specific task with defined variables, so saturation is likely to happen sooner.

However, if you are running an ideation workshop, testing reactions to advertising, or studying complex products, saturation will take longer: you may need to talk to 20+ participants before you see those patterns become really defined and you feel you’ve reached saturation, by exploring all of the available themes.

Saturation is extremely important – pay attention to it

If you find that you’ve reached saturation very quickly, don’t necessarily cut the study short right away. First ask: Have we thoroughly covered the audience for this idea/product? If not, recruit additional participants who fit your segment, and test their ideas. Conversely, if you’ve spoken to 45 people and have not heard anything new in the previous 10 interviews, likely you won’t be uncovering too many new themes, so it may be wise to stop at the number you’re currently at.

There are other variables involved in saturation, such as the quality of your recruiting, how you ask your questions (your discussion guide), and the overall focus of your objectives. We will cover those in subsequent posts. For now, we hope you have an understanding of what saturation is and why it’s so critical in qualitative research.

If you want to learn more about conducting qualitative research, check out our training programs from InterQ Learning Labs.

Interested in learning more from our research experts? Request a proposal today >

saturation in qualitative research definition

  • Request Proposal
  • Participate in Studies
  • Our Leadership Team
  • Our Approach
  • Mission, Vision and Core Values
  • Qualitative Research
  • Quantitative Research
  • Research Insights Workshops
  • Customer Journey Mapping
  • Millennial & Gen Z Market Research
  • Market Research Services
  • Our Clients
  • InterQ Blog
  • Panelist area
  • Become a panelist

What is the concept of saturation in qualitative research?

' src=

“Saturation” is a term that often comes up when we are interested in qualitative methodology and, in particular qualitative interviews. The concept of saturation is related to the number of interviews to be conducted in qualitative research.

This article defines the concept of saturation in qualitative research, illustrates it with an  example , and gives  practical advice .

Contact us for your qualitative market research needs

  • Measuring saturation

Definition of the concept of saturation

As noted by Marshall et al. (2013) and Guest et al. (2006), the concept of saturation in qualitative research is often invoked but rarely defined. Over the years, it has become a vague term that needs to be precisely defined. We can formulate the definition of the concept of saturation in different ways:

  • the point in time when the collection of new qualitative data no longer changes or changes little, your coding manual
  • the point at which each recent qualitative interview produces only previously discovered data
  • the point at which the performance of your research declines, i.e., each new interview makes a smaller contribution than the previous one

saturation in qualitative research definition

credits: Shutterstock

How to measure saturation?

It is reasonable to say that before 2006, no research had been done on the concept of saturation itself. In an article that has become central to qualitative research,  Guest, Bunce, and Johnson (2006)  finally dared to open Pandora’s box and tackle a subject that had previously been treated only superficially.

The 3 authors wanted to understand at what point a new interview no longer brought new knowledge. Based on qualitative research carried out in 2 African countries (Ghana and Nigeria), they measured the number of new codes that appeared per group of 6 interviews (see graph below).

Création des codes au cours des interviews qualitatives (d'après Guest, Bunce et Johnson, 2006)

Code creation during qualitative interviews (based on Guest, Bunce, & Johnson, 2006)

As can be seen, the number of codes found decreases with the number of interviews conducted. This is logical and illustrates the idea of diminishing returns. However, what is more, surprising is the “speed” at which this decrease in yield occurs. Beyond the 18th interview, new codes become rarer until they almost disappear beyond the 36th interview.

This conclusion led  Marshall et al . (2013)  to propose a schema summarizing the situation (see below). As can be seen, a first limit is drawn to indicate saturation (30 interviews, dotted line), and the idea of a negative return is visible around the 35th interview.

Nombre d'entretiens qualitatifs à réaliser et survenance de la saturation dans une étude phénoménologique (d'après Marshall et al. 2013)

Number of qualitative interviews to be conducted and occurrence of saturation in a phenomenological research study (from Marshall et al. 2013)

Does this mean that 30 interviews are required for any qualitative research? While many researchers agree on the “magic number” of 30 (see, for example,  Baker and Edwards, 2012 ), others cautiously remind us that several variables must be taken into account when deciding on the ideal sample size ( Morse, 2000 )

At the heart of the notion of saturation is, of course, the question of the number of qualitative interviews to be conducted. Robust research, unassailable from a methodological point of view, should imperatively go beyond a simple claim to the concept of saturation but should genuinely strive to show that the n+1 interview no longer brings anything new than the n interview.

To do this, it is imperative to code one’s interviews  and, here again, disregard weak methodological approaches that are content to take notes “on the fly.” The time spent preparing and conducting interviews justifies making the most of them. To learn all about qualitative interview analysis, we have written  a guide that you will find here.

Baker, S. E., & Edwards, R. (2012). How many qualitative interviews is enough.

Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An experiment with data saturation and variability. Field methods, 18(1), 59-82.

Marshall, B., Cardon, P., Poddar, A., & Fontenot, R. (2013). Does sample size matter in qualitative research?: A review of qualitative interviews in IS research. Journal of computer information systems, 54(1), 11-22.

Morse, J. M. (2000). Determining sample size.

Post your opinion

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Don't forget to check your spam folder .

You didn't receive the link?

Pour offrir les meilleures expériences, nous utilisons des technologies telles que les cookies pour stocker et/ou accéder aux informations des appareils. Le fait de consentir à ces technologies nous permettra de traiter des données telles que le comportement de navigation ou les ID uniques sur ce site. Le fait de ne pas consentir ou de retirer son consentement peut avoir un effet négatif sur certaines caractéristiques et fonctions.

IMAGES

  1. Research

    saturation in qualitative research definition

  2. PPT

    saturation in qualitative research definition

  3. Data Saturation in Qualitative Research

    saturation in qualitative research definition

  4. Saturation in qualitative research samples

    saturation in qualitative research definition

  5. Studies on Saturation in Qualitative research, suggested sample sizes...

    saturation in qualitative research definition

  6. What Is Data Saturation In Qualitative Research?

    saturation in qualitative research definition

VIDEO

  1. Techlog 2015

  2. Qualitative Research Analysis Approaches

  3. SoundBite: What is saturation? (It's not what you think)

  4. What does saturation point mean?

  5. QUALITATIVE RESEARCH DEFINITION & CHARACTERISTICS

  6. Saturation Point in Qualitative Research

COMMENTS

  1. Saturation in qualitative research: exploring its conceptualization and operationalization

    Introduction. In broad terms, saturation is used in qualitative research as a criterion for discontinuing data collection and/or analysis. 1 Its origins lie in grounded theory (Glaser and Strauss 1967), but in one form or another it now commands acceptance across a range of approaches to qualitative research.Indeed, saturation is often proposed as an essential methodological element within ...

  2. Sample sizes for saturation in qualitative research: A systematic

    1. Introduction. Saturation is the most common guiding principle for assessing the adequacy of purposive samples in qualitative research (Morse, 1995, 2015; Sandelowski, 1995).However, guidance on assessing saturation and the sample sizes needed to reach saturation have been vague.

  3. Demystification and Actualisation of Data Saturation in Qualitative

    Qualitative research is assessed by assessing if all the necessary information to address the research questions has been gathered and evaluated, specifically if a point of data saturation has been achieved (Kerr et al., 2010).Saturation is a crucial notion in qualitative research, as it guarantees that the data have been thoroughly examined and that the findings are strong and trustworthy ...

  4. Saturation: An Overworked and Misunderstood Concept?

    Saturation—which supposedly occurs when the collection and/or analysis of additional data adds nothing new to a piece of research—is a key concept in qualitative research. At one level, it can be seen as an attempt to place qualitative analysis on a par with quantitative; just as quantitative researchers can claim, through the application ...

  5. Saturation in qualitative research: An evolutionary concept analysis

    Saturation in qualitative research is a context-dependent, subjective process that requires detailed systematic analysis. Saturation is used in four ways in qualitative research: theoretical saturation, data saturation, code or thematic saturation, and meaning saturation. The antecedents of saturation were classified into two categories: study ...

  6. A simple method to assess and report thematic saturation in qualitative

    Operationalizing and assessing saturation. The range of empirical work on saturation in qualitative research and detail on the operationalization and assessment metrics used in data-driven studies that address saturation are summarized in Table 1.In reviewing these studies to inform the development of our approach to assessing saturation, we identified three limitations to the broad ...

  7. Data Saturation in Qualitative Research

    Data saturation is the point in a research process where enough data has been collected to draw necessary conclusions, and any further data collection will not produce value-added insights. Data saturation is a term that originates from qualitative research's grounded theory, a broad research method first coined in the 1960s by sociologists ...

  8. A simple method to assess and report thematic saturation in qualitative

    A brief history of saturation and qualitative sample size estimation. How many qualitative interviews are enough? Across academic disciplines, and for about the past five decades, the answer to this question has usually revolved around reaching saturation [1, 5-9].The concept of saturation was first introduced into the field of qualitative research as "theoretical saturation" by Glaser ...

  9. Ten steps for specifying saturation in qualitative research

    Saturation is critical to reliability of qualitative research. • Saturation can be checked through post facto numerical tests. • Saturation depends on detailed design and conduct of research. • Propose a ten-point protocol to specify steps toward saturation. • These should be addressed routinely in qualitative research.

  10. Full article: Saturation controversy in qualitative research

    Saturation definition controversy. What is it? ... The other confusion in the saturation in qualitative research puzzle has to do with differences in the research designs as well as the data collection methods used such as literature review, observation, interviews and focus groups. These are discussed briefly in 5.1 and 5.2, with 5.3 covering ...

  11. Saturation in qualitative research: exploring its conceptualization and

    Saturation has attained widespread acceptance as a methodological principle in qualitative research. It is commonly taken to indicate that, on the basis of the data that have been collected or analysed hitherto, further data collection and/or analysis are unnecessary. However, there appears to be uncertainty as to how saturation should be conceptualized, and inconsistencies in its use. In this ...

  12. Saturation in qualitative research: exploring its conceptualization and

    Saturation has attained widespread acceptance as a methodological principle in qualitative research. It is commonly taken to indicate that, on the basis of the data that have been collected or analysed hitherto, further data collection and/or analysis are unnecessary. However, there appears to be uncertainty as to how saturation should be ...

  13. (PDF) Saturation in qualitative research: exploring its

    Abstract. Saturation has attained widespread acceptance as a methodological principle in qualitative research. It is commonly taken to indicate that, on the basis of the data that have been ...

  14. Saturation in qualitative research: exploring its conceptualization and

    It is concluded that saturation should be operationalized in a way that is consistent with the research question(s), and the theoretical position and analytic framework adopted, but also that there should be some limit to its scope, so as to risk saturation losing its coherence and potency if its conceptualization and uses are stretched too widely. Saturation has attained widespread acceptance ...

  15. What is data saturation in qualitative research?

    Data saturation is a point in data collection when new information no longer brings fresh insights to the research questions. Reaching data saturation means you've collected enough data to confidently understand the patterns and themes within the dataset - you've got what you need to draw conclusions and make your points.

  16. Sample sizes for saturation in qualitative research: A systematic

    Abstract. Objective: To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and guidance we can draw from these studies. Methods: We conducted a systematic review of four databases to identify studies empirically assessing sample sizes ...

  17. What is Data Saturation in Qualitative Research?

    February 2, 2022. Article Summary: Data saturation is a principle in qualitative research whereby you can have significantly smaller sample sizes than in quantitative research. This is because of "data saturation" which means you start to hear the same themes repeatedly in a research interview. Whether you're in academia or in business ...

  18. Ten steps for specifying saturation in qualitative research

    Saturation is critical to reliability of qualitative research. • Saturation can be checked through post facto numerical tests. • Saturation depends on detailed design and conduct of research. • Propose a ten-point protocol to specify steps toward saturation. • These should be addressed routinely in qualitative research.

  19. Saturation: An Overworked and Misunderstood Concept?

    At another level, saturation allows the harassed and time- poor qualitative researcher to call it a day on a particular project and move on. This article argues that the concept has been overworked and misunderstood, and that it is perhaps now time to replace or retire it. In an analogous way to how claiming saturation allows a project to end ...

  20. What is the concept of saturation in qualitative research?

    Definition of the concept of saturation. As noted by Marshall et al. (2013) and Guest et al. (2006), the concept of saturation in qualitative research is often invoked but rarely defined. Over the years, it has become a vague term that needs to be precisely defined. We can formulate the definition of the concept of saturation in different ways:

  21. PDF Saturation in qualitative research: exploring its ...

    Introduction. In broad terms, saturation is used in qualitative research as a criterion for discontinuing data collection and/or analysis.1Its origins lie in grounded theory (Glaser and Strauss 1967), but in one form or another it now commands acceptance across a range of approaches to qualitative research.

  22. Code Saturation Versus Meaning Saturation: How Many Interviews Are

    The concept of saturation was originally developed by Glaser and Strauss (1967) as part of their influential grounded theory approach to qualitative research, which focuses on developing sociological theory from textual data to explain social phenomena. In grounded theory, the term theoretical saturation is used, which refers to the point in data collection when no additional issues or ...

  23. Naturalistic inquiry and the saturation concept: a research note

    This research note examines the saturation concept in naturalistic inquiry and the challenges it presents. In particular, it summarizes the saturation process in a grounded theory study of community-based antipoverty projects. The main argument advanced in this research note is that claims of saturation should be supported by an explanation of ...