Research Bias 101: What You Need To Know
By: Derek Jansen (MBA) | Expert Reviewed By: Dr Eunice Rautenbach | September 2022
If you’re new to academic research, research bias (also sometimes called researcher bias) is one of the many things you need to understand to avoid compromising your study. If you’re not careful, research bias can ruin the credibility of your study.
In this post, we’ll unpack the thorny topic of research bias. We’ll explain what it is , look at some common types of research bias and share some tips to help you minimise the potential sources of bias in your research.
Overview: Research Bias 101
- What is research bias (or researcher bias)?
- Bias #1 – Selection bias
- Bias #2 – Analysis bias
- Bias #3 – Procedural (admin) bias
So, what is research bias?
Well, simply put, research bias is when the researcher – that’s you – intentionally or unintentionally skews the process of a systematic inquiry , which then of course skews the outcomes of the study . In other words, research bias is what happens when you affect the results of your research by influencing how you arrive at them.
For example, if you planned to research the effects of remote working arrangements across all levels of an organisation, but your sample consisted mostly of management-level respondents , you’d run into a form of research bias. In this case, excluding input from lower-level staff (in other words, not getting input from all levels of staff) means that the results of the study would be ‘biased’ in favour of a certain perspective – that of management.
Of course, if your research aims and research questions were only interested in the perspectives of managers, this sampling approach wouldn’t be a problem – but that’s not the case here, as there’s a misalignment between the research aims and the sample .
Now, it’s important to remember that research bias isn’t always deliberate or intended. Quite often, it’s just the result of a poorly designed study, or practical challenges in terms of getting a well-rounded, suitable sample. While perfect objectivity is the ideal, some level of bias is generally unavoidable when you’re undertaking a study. That said, as a savvy researcher, it’s your job to reduce potential sources of research bias as much as possible.
To minimize potential bias, you first need to know what to look for . So, next up, we’ll unpack three common types of research bias we see at Grad Coach when reviewing students’ projects . These include selection bias , analysis bias , and procedural bias . Keep in mind that there are many different forms of bias that can creep into your research, so don’t take this as a comprehensive list – it’s just a useful starting point.
Bias #1 – Selection Bias
First up, we have selection bias . The example we looked at earlier (about only surveying management as opposed to all levels of employees) is a prime example of this type of research bias. In other words, selection bias occurs when your study’s design automatically excludes a relevant group from the research process and, therefore, negatively impacts the quality of the results.
With selection bias, the results of your study will be biased towards the group that it includes or favours, meaning that you’re likely to arrive at prejudiced results . For example, research into government policies that only includes participants who voted for a specific party is going to produce skewed results, as the views of those who voted for other parties will be excluded.
Selection bias commonly occurs in quantitative research , as the sampling strategy adopted can have a major impact on the statistical results . That said, selection bias does of course also come up in qualitative research as there’s still plenty room for skewed samples. So, it’s important to pay close attention to the makeup of your sample and make sure that you adopt a sampling strategy that aligns with your research aims. Of course, you’ll seldom achieve a perfect sample, and that okay. But, you need to be aware of how your sample may be skewed and factor this into your thinking when you analyse the resultant data.
Need a helping hand?
Bias #2 – Analysis Bias
Next up, we have analysis bias . Analysis bias occurs when the analysis itself emphasises or discounts certain data points , so as to favour a particular result (often the researcher’s own expected result or hypothesis). In other words, analysis bias happens when you prioritise the presentation of data that supports a certain idea or hypothesis , rather than presenting all the data indiscriminately .
For example, if your study was looking into consumer perceptions of a specific product, you might present more analysis of data that reflects positive sentiment toward the product, and give less real estate to the analysis that reflects negative sentiment. In other words, you’d cherry-pick the data that suits your desired outcomes and as a result, you’d create a bias in terms of the information conveyed by the study.
Although this kind of bias is common in quantitative research, it can just as easily occur in qualitative studies, given the amount of interpretive power the researcher has. This may not be intentional or even noticed by the researcher, given the inherent subjectivity in qualitative research. As humans, we naturally search for and interpret information in a way that confirms or supports our prior beliefs or values (in psychology, this is called “confirmation bias”). So, don’t make the mistake of thinking that analysis bias is always intentional and you don’t need to worry about it because you’re an honest researcher – it can creep up on anyone .
To reduce the risk of analysis bias, a good starting point is to determine your data analysis strategy in as much detail as possible, before you collect your data . In other words, decide, in advance, how you’ll prepare the data, which analysis method you’ll use, and be aware of how different analysis methods can favour different types of data. Also, take the time to reflect on your own pre-conceived notions and expectations regarding the analysis outcomes (in other words, what do you expect to find in the data), so that you’re fully aware of the potential influence you may have on the analysis – and therefore, hopefully, can minimize it.
Bias #3 – Procedural Bias
Last but definitely not least, we have procedural bias , which is also sometimes referred to as administration bias . Procedural bias is easy to overlook, so it’s important to understand what it is and how to avoid it. This type of bias occurs when the administration of the study, especially the data collection aspect, has an impact on either who responds or how they respond.
A practical example of procedural bias would be when participants in a study are required to provide information under some form of constraint. For example, participants might be given insufficient time to complete a survey, resulting in incomplete or hastily-filled out forms that don’t necessarily reflect how they really feel. This can happen really easily, if, for example, you innocently ask your participants to fill out a survey during their lunch break.
Another form of procedural bias can happen when you improperly incentivise participation in a study. For example, offering a reward for completing a survey or interview might incline participants to provide false or inaccurate information just to get through the process as fast as possible and collect their reward. It could also potentially attract a particular type of respondent (a freebie seeker), resulting in a skewed sample that doesn’t really reflect your demographic of interest.
The format of your data collection method can also potentially contribute to procedural bias. If, for example, you decide to host your survey or interviews online, this could unintentionally exclude people who are not particularly tech-savvy, don’t have a suitable device or just don’t have a reliable internet connection. On the flip side, some people might find in-person interviews a bit intimidating (compared to online ones, at least), or they might find the physical environment in which they’re interviewed to be uncomfortable or awkward (maybe the boss is peering into the meeting room, for example). Either way, these factors all result in less useful data.
Although procedural bias is more common in qualitative research, it can come up in any form of fieldwork where you’re actively collecting data from study participants. So, it’s important to consider how your data is being collected and how this might impact respondents. Simply put, you need to take the respondent’s viewpoint and think about the challenges they might face, no matter how small or trivial these might seem. So, it’s always a good idea to have an informal discussion with a handful of potential respondents before you start collecting data and ask for their input regarding your proposed plan upfront.
Let’s Recap
Ok, so let’s do a quick recap. Research bias refers to any instance where the researcher, or the research design , negatively influences the quality of a study’s results, whether intentionally or not.
The three common types of research bias we looked at are:
- Selection bias – where a skewed sample leads to skewed results
- Analysis bias – where the analysis method and/or approach leads to biased results – and,
- Procedural bias – where the administration of the study, especially the data collection aspect, has an impact on who responds and how they respond.
As I mentioned, there are many other forms of research bias, but we can only cover a handful here. So, be sure to familiarise yourself with as many potential sources of bias as possible to minimise the risk of research bias in your study.
Psst... there’s more!
This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...
This is really educational and I really like the simplicity of the language in here, but i would like to know if there is also some guidance in regard to the problem statement and what it constitutes.
Do you have a blog or video that differentiates research assumptions, research propositions and research hypothesis?
Submit a Comment Cancel reply
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
- Print Friendly
Instant insights, infinite possibilities
Understanding the different types of bias in research (2024 guide)
Last updated
6 October 2023
Reviewed by
Miroslav Damyanov
Short on time? Get an AI generated summary of this article instead
Research bias is an invisible force that overly highlights or dismisses the chosen study topic’s traits. When left unchecked, it can significantly impact the validity and reliability of your research.
In a perfect world, every research project would be free of any trace of bias—but for this to happen, you need to be aware of the most common types of research bias that plague studies.
Read this guide to learn more about the most common types of bias in research and what you can do to design and improve your studies to create high-quality research results.
- What is research bias?
Research bias is the tendency for qualitative and quantitative research studies to contain prejudice or preference for or against a particular group of people, culture, object, idea, belief, or circumstance.
Bias is rarely based on observed facts. In most cases, it results from societal stereotypes, systemic discrimination, or learned prejudice.
Every human develops their own set of biases throughout their lifetime as they interact with their environment. Often, people are unaware of their own biases until they are challenged—and this is why it’s easy for unintentional bias to seep into research projects .
Left unchecked, bias ruins the validity of research . So, to get the most accurate results, researchers need to know about the most common types of research bias and understand how their study design can address and avoid these outcomes.
- The two primary types of bias
Historically, there are two primary types of bias in research:
Conscious bias
Conscious bias is the practice of intentionally voicing and sharing a negative opinion about a particular group of people, beliefs, or concepts.
Characterized by negative emotions and opinions of the target group, conscious bias is often defined as intentional discrimination.
In most cases, this type of bias is not involved in research projects, as they are unjust, unfair, and unscientific.
Unconscious bias
An unconscious bias is a negative response to a particular group of people, beliefs, or concepts that is not identified or intentionally acted upon by the bias holder.
Because of this, unconscious bias is incredibly dangerous. These warped beliefs shape and impact how someone conducts themselves and their research. The trouble is that they can’t identify the moral and ethical issues with their behavior.
- Examples of commonly occurring research bias
Humans use countless biases daily to quickly process information and make sense of the world. But, to create accurate research studies and get the best results, you must remove these biases from your study design.
Here are some of the most common types of research biases you should look out for when planning your next study:
Information bias
During any study, tampering with data collection is widely agreed to be bad science. But what if your study design includes information biases you are unaware of?
Also known as measurement bias, information bias occurs when one or more of the key study variables are not correctly measured, recorded, or interpreted. As a result, the study’s perceived outcome may be inaccurate due to data misclassification, omission, or obfuscation (obscuring).
Observer bias
Observer bias occurs when researchers don’t have a clear understanding of their own personal assumptions and expectations. During observational studies, it’s possible for a researcher’s personal biases to impact how they interpret the data. This can dramatically affect the study’s outcome.
The study should be double-blind to combat this type of bias. This is where the participants don’t know which group they are in, and the observers don’t know which group they are observing.
Regression to the mean (RTM)
Bias can also impact research statistics.
Regression of the mean (RTM) refers to a statistical bias that if a first clinical reading is extreme in value (i.e., it’s very high or very low compared to the average), the second reading will provide a more statistically normal result.
Here’s an example: you might be nervous when a doctor takes your blood pressure in the doctor’s surgery. The first result might be quite high. This is a phenomenon known as “white coat syndrome.” When your blood pressure is retaken to double-check the value, it is more likely to be closer to typical values.
So, which value is more accurate, and which should you record as the truth?
The answer depends on the specific design of your study. However, using control groups is usually recommended for studies with a high risk of RTM.
Performance bias
A performance bias can develop if participants understand the study’s nature or desired outcomes. This can harm the study’s accuracy, as participants may adjust their behavior outside of their normal to improve their performance. This results in inaccurate data and study results.
This is a common bias type in medical and health studies, particularly those studying the differences between two lifestyle choices.
To reduce performance bias, researchers should strive to keep members of the control and study groups unaware of the other group’s activities. This method is known as “blinding.”
Recall bias
How good is your memory? Chances are, it’s not as good as you think—and the older the memory, the more inaccurate and biased it will become.
A recall bias commonly occurs in self-reporting studies requiring participants to remember past information. While people can remember big-picture events (like the day they got married or landed their first job), routine occurrences like what they do after work every Tuesday are harder to recall.
To offset this type of bias, design a study that engages with participants on both short- and long-term periods to help keep the content more top of mind.
Researcher bias
Researcher bias (also known as interviewer bias) occurs due to the researcher’s personal beliefs or tendencies that influence the study’s results or outcomes.
These types of biases can be intentional or unintentional, and most are driven by personal feelings, historical stereotypes, and assumptions about the study’s outcome before it has even begun.
Question order bias
Survey design and question order is a huge area of contention for researchers. These elements are essential for quality study design and can prevent or invite answer bias.
When designing a research study that collects data via survey questions , the order of the questions presented can impact how the participants answer each subsequent question. Leading questions (questions that guide participants toward a particular answer) are perfect examples of this. When included early in the survey, they can sway a participant’s opinions and answers as they complete the questionnaire .
This is known as systematic distortion, meaning each question answered after the guiding questions is impacted or distorted by the wording of the questions before.
Demand characteristics
Body language and social cues play a significant role in human communication—and this also rings true for the validity of research projects .
A demand characteristic bias can occur due to a verbal or non-verbal cue that encourages research participants to behave in a particular way.
Imagine a researcher is studying a group of new grad business students about their experience applying to new jobs one, three, and six months after graduation. They scowl every time a participant mentions they don’t use a cover letter. This reaction may encourage participants to change their answers, harming the study’s outcome and resulting in less accurate results.
Courtesy bias
Courtesy bias arises from not wanting to share negative or constructive feedback or answers—a common human tendency.
You’ve probably been in this situation before. Think of a time when you had a negative opinion or perspective on a topic, but you felt the need to soften or reduce the harshness of your feedback to prevent someone’s feelings from being hurt.
This type of bias also occurs in research. Without a comfortable and non-judgmental environment that encourages honest responses, courtesy bias can result in inaccurate data intake.
Studies based on small group interviews, focus groups , or any in-person surveys are particularly vulnerable to this type of bias because people are less likely to share negative opinions in front of others or to someone’s face.
Extreme responding
Extreme responding refers to the tendency for people to respond on one side of the scale or the other, even if these extreme answers don’t reflect their true opinion.
This is a common bias in surveys, particularly online surveys asking about a person’s experience or personal opinions (think questionnaires that ask you to decide if you strongly disagree, disagree, agree, or strongly agree with a statement).
When this occurs, the data will be skewed. It will be overly positive or negative—not accurate. This is a problem because the data can impact future decisions or study outcomes.
Writing different styles of questions and asking for follow-up interviews with a small group of participants are a few options for reducing the impact of this type of bias.
Social desirability bias
Everyone wants to be liked and respected. As a result, societal bias can impact survey answers.
It’s common for people to answer questions in a way that they believe will earn them favor, respect, or agreement with researchers. This is a common bias type for studies on taboo or sensitive topics like alcohol consumption or physical activity levels, where participants feel vulnerable or judged when sharing their honest answers.
Finding ways to comfort participants with ensured anonymity and safe and respectful research practices are ways you can offset the impact of social desirability bias.
Selection bias
For the most accurate results, researchers need to understand their chosen population before accepting participants. Failure to do this results in selection bias, which is caused by an inaccurate or misrepresented selection of participants that don’t truly reflect the chosen population.
Self-selection bias
To collect data, researchers in many studies require participants to volunteer their time and experiences. This results in a study design that is automatically biased toward people who are more likely to get involved.
People who are more likely to voluntarily participate in a study are not reflective of the common experience of a broad, diverse population. Because of this, any information collected from this type of study will contain a self-selection bias .
To avoid this type of bias, researchers can use random assignment (using control versus treatment groups to divide the study participants after they volunteer).
Sampling or ascertainment bias
When choosing participants for a study, take care to select people who are representative of the overall population being researched. Failure to do this will result in sampling bias.
For example, if researchers aim to learn more about how university stress impacts sleep quality but only choose engineering students as participants, the study won’t reflect the wider population they want to learn more about.
To avoid sampling bias, researchers must first have a strong understanding of their chosen study population. Then, they should take steps to ensure that any person within that population has an equal chance of being selected for the study.
Attrition bias
People tend to be hard on themselves, so an attrition bias toward the impact of failure versus success can seep into research.
Many people find it easier to list things they struggle with rather than things they think they are good at. This also occurs in research, as people are more likely to value the impact of a negative experience (or failure) than that of a positive, successful outcome.
Survivorship bias
In medical clinical trials and studies, a survivorship bias may develop if only the results and data from participants who survived the trial are studied. Survivorship bias also includes participants who were unable to complete the entire trial, not just those who passed away during the duration of the study.
In long-term studies that evaluate new medications or therapies for high-mortality diseases like aggressive cancers, choosing to only consider the success rate, side effects, or experiences of those who completed the study eliminates a large portion of important information. This disregarded information may have offered insights into the quality, efficacy, and safety of the treatment being tested.
Nonresponse bias
A nonresponse bias occurs when a portion of chosen participants decide not to complete or participate in the study. This is a common issue in survey-based research (especially online surveys).
In survey-based research, the issue of response versus nonresponse rates can impact the quality of the information collected. Every nonresponse is a missed opportunity to get a better understanding of the chosen population, whether participants choose not to reply based on subject apathy, shame, guilt, or a lack of skills or resources.
To combat this bias, improve response rates using multiple different survey styles. These might include in-person interviews, mailed paper surveys, and virtual options. However, note that these efforts will never completely remove nonresponse bias from your study.
Cognitive bias
Cognitive biases result from repeated errors in thinking or memory caused by misinterpreting information, oversimplifying a situation, or making inaccurate mental shortcuts. They can be tricky to identify and account for, as everyone lives with invisible cognitive biases that govern how they understand and interact with their surrounding environment.
Anchoring bias
When given a list of information, humans have a tendency to overemphasize (or anchor onto) the first thing mentioned.
For example, if you ask people to remember a grocery list of items that starts with apples, bananas, yogurt, and bread, people are most likely to remember apples over any of the other ingredients. This is because apples were mentioned first, despite not being any more important than the other items listed.
This habit conflates the importance and significance of this one piece of information, which can impact how you respond to or feel about the other equally important concepts being mentioned.
Halo effect
The halo effect explains the tendency for people to form opinions or assumptions about other people based on one specific characteristic. Most commonly seen in studies about physical appearance and attractiveness, the halo effect can cause either a positive or negative response depending on how the defined trait is perceived.
Framing effect
Framing effect bias refers to how you perceive information based on how it’s presented to you.
To demonstrate this, decide which of the following desserts sounds more delicious.
“Made with 95% natural ingredients!”
“Contains only 5% non-natural ingredients!”
Both of these claims say the same thing, but most people have a framing effect bias toward the first claim as it’s positive and more impactful.
This type of bias can significantly impact how people perceive or react to data and information.
The misinformation effect
The misinformation effect refers to the brain’s tendency to alter or misremember past experiences when it has since been fed inaccurate information. This type of bias can significantly impact how a person feels about, remembers, or trusts the authority of their previous experiences.
Confirmation bias
Confirmation bias occurs when someone unconsciously prefers or favors information that confirms or validates their beliefs and ideas.
In some cases, confirmation bias is so strong that people find themselves disregarding information that counters their worldview, resulting in poorer research accuracy and quality.
We all like being proven right (even if we are testing a research hypothesis ), so this is a commonly occurring cognitive bias that needs to be addressed during any scientific study.
Availability heuristic
All humans contextualize and understand the world around them based on their past experiences and memories. Because of this, people tend to have an availability bias toward explanations they have heard before.
People are more likely to assume or gravitate toward reasoning and ideas that align with past experience. This is known as the availability heuristic . Information and connections that are more available or accessible in your memory might seem more likely than other alternatives. This can impact the validity of research efforts.
- How to avoid bias in your research
Research is a compelling, complex, and essential part of human growth and learning, but collecting the most accurate data possible also poses plenty of challenges.
Should you be using a customer insights hub?
Do you want to discover previous research faster?
Do you share your research findings with others?
Do you analyze research data?
Start for free today, add your research, and get to key insights faster
Editor’s picks
Last updated: 18 April 2023
Last updated: 27 February 2023
Last updated: 22 August 2024
Last updated: 5 February 2023
Last updated: 16 August 2024
Last updated: 9 March 2023
Last updated: 30 April 2024
Last updated: 12 December 2023
Last updated: 11 March 2024
Last updated: 4 July 2024
Last updated: 6 March 2024
Last updated: 5 March 2024
Last updated: 13 May 2024
Latest articles
Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.
Get started for free
The Ultimate Guide to Qualitative Research - Part 1: The Basics
- Introduction and overview
- What is qualitative research?
- What is qualitative data?
- Examples of qualitative data
- Qualitative vs. quantitative research
- Mixed methods
- Qualitative research preparation
- Theoretical perspective
- Theoretical framework
- Literature reviews
- Research question
- Conceptual framework
- Conceptual vs. theoretical framework
- Data collection
- Qualitative research methods
- Focus groups
- Observational research
- Case studies
- Ethnographical research
- Ethical considerations
- Confidentiality and privacy
What is research bias?
Understanding unconscious bias, how to avoid bias in research, bias and subjectivity in research.
- Power dynamics
- Reflexivity
Bias in research
In a purely objective world, research bias would not exist because knowledge would be a fixed and unmovable resource; either one knows about a particular concept or phenomenon, or they don't. However, qualitative research and the social sciences both acknowledge that subjectivity and bias exist in every aspect of the social world, which naturally includes the research process too. This bias is manifest in the many different ways that knowledge is understood, constructed, and negotiated, both in and out of research.
Understanding research bias has profound implications for data collection methods and data analysis , requiring researchers to take particular care of how to account for the insights generated from their data .
Research bias, often unavoidable, is a systematic error that can creep into any stage of the research process , skewing our understanding and interpretation of findings. From data collection to analysis, interpretation , and even publication , bias can distort the truth we seek to capture and communicate in our research.
It’s also important to distinguish between bias and subjectivity, especially when engaging in qualitative research . Most qualitative methodologies are based on epistemological and ontological assumptions that there is no such thing as a fixed or objective world that exists “out there” that can be empirically measured and understood through research. Rather, many qualitative researchers embrace the socially constructed nature of our reality and thus recognize that all data is produced within a particular context by participants with their own perspectives and interpretations. Moreover, the researcher’s own subjective experiences inevitably shape how they make sense of the data. These subjectivities are considered to be strengths, not limitations, of qualitative research approaches, because they open new avenues for knowledge generation. This is also why reflexivity is so important in qualitative research. When we refer to bias in this guide, on the other hand, we are referring to systematic errors that can negatively affect the research process but that can be mitigated through researchers’ careful efforts.
To fully grasp what research bias is, it's essential to understand the dual nature of bias. Bias is not inherently evil. It's simply a tendency, inclination, or prejudice for or against something. In our daily lives, we're subject to countless biases, many of which are unconscious. They help us navigate our world, make quick decisions, and understand complex situations. But when conducting research, these same biases can cause significant issues.
Research bias can affect the validity and credibility of research findings, leading to erroneous conclusions. It can emerge from the researcher's subconscious preferences or the methodological design of the study itself. For instance, if a researcher unconsciously favors a particular outcome of the study, this preference could affect how they interpret the results, leading to a type of bias known as confirmation bias.
Research bias can also arise due to the characteristics of study participants. If the researcher selectively recruits participants who are more likely to produce desired outcomes, this can result in selection bias.
Another form of bias can stem from data collection methods . If a survey question is phrased in a way that encourages a particular response, this can introduce response bias. Moreover, inappropriate survey questions can have a detrimental effect on future research if such studies are seen by the general population as biased toward particular outcomes depending on the preferences of the researcher.
Bias can also occur during data analysis . In qualitative research for instance, the researcher's preconceived notions and expectations can influence how they interpret and code qualitative data, a type of bias known as interpretation bias. It's also important to note that quantitative research is not free of bias either, as sampling bias and measurement bias can threaten the validity of any research findings.
Given these examples, it's clear that research bias is a complex issue that can take many forms and emerge at any stage in the research process. This section will delve deeper into specific types of research bias, provide examples, discuss why it's an issue, and provide strategies for identifying and mitigating bias in research.
What is an example of bias in research?
Bias can appear in numerous ways. One example is confirmation bias, where the researcher has a preconceived explanation for what is going on in their data, and any disconfirming evidence is (unconsciously) ignored. For instance, a researcher conducting a study on daily exercise habits might be inclined to conclude that meditation practices lead to greater engagement in exercise because that researcher has personally experienced these benefits. However, conducting rigorous research entails assessing all the data systematically and verifying one’s conclusions by checking for both supporting and refuting evidence.
What is a common bias in research?
Confirmation bias is one of the most common forms of bias in research. It happens when researchers unconsciously focus on data that supports their ideas while ignoring or undervaluing data that contradicts their ideas. This bias can lead researchers to mistakenly confirm their theories, despite having insufficient or conflicting evidence.
What are the different types of bias?
There are several types of research bias, each presenting unique challenges. Some common types include:
Confirmation bias: As already mentioned, this happens when a researcher focuses on evidence supporting their theory while overlooking contradictory evidence.
Selection bias: This occurs when the researcher's method of choosing participants skews the sample in a particular direction.
Response bias: This happens when participants in a study respond inaccurately or falsely, often due to misleading or poorly worded questions.
Observer bias (or researcher bias): This occurs when the researcher unintentionally influences the results because of their expectations or preferences.
Publication bias: This type of bias arises when studies with positive results are more likely to get published, while studies with negative or null results are often ignored.
Analysis bias: This type of bias occurs when the data is manipulated or analyzed in a way that leads to a particular result, whether intentionally or unintentionally.
What is an example of researcher bias?
Researcher bias, also known as observer bias, can occur when a researcher's expectations or personal beliefs influence the results of a study. For instance, if a researcher believes that a particular therapy is effective, they might unconsciously interpret ambiguous results in a way that supports the efficacy of the therapy, even if the evidence is not strong enough.
Even quantitative research methodologies are not immune from bias from researchers. Market research surveys or clinical trial research, for example, may encounter bias when the researcher chooses a particular population or methodology to achieve a specific research outcome. Questions in customer feedback surveys whose data is employed in quantitative analysis can be structured in such a way as to bias survey respondents toward certain desired answers.
Turn your data into findings with ATLAS.ti
Key insights are at your fingertips with our powerful interface. See how with a free trial.
Identifying and avoiding bias in research
As we will remind you throughout this chapter, bias is not a phenomenon that can be removed altogether, nor should we think of it as something that should be eliminated. In a subjective world involving humans as researchers and research participants , bias is unavoidable and almost necessary for understanding social behavior. The section on reflexivity later in this guide will highlight how different perspectives among researchers and human subjects are addressed in qualitative research. That said, bias in excess can place the credibility of a study's findings into serious question. Scholars who read your research need to know what new knowledge you are generating, how it was generated, and why the knowledge you present should be considered persuasive. With that in mind, let's look at how bias can be identified and, where it interferes with research, minimized.
How do you identify bias in research?
Identifying bias involves a critical examination of your entire research study involving the formulation of the research question and hypothesis , the selection of study participants, the methods for data collection, and the analysis and interpretation of data. Researchers need to assess whether each stage has been influenced by bias that may have skewed the results. Tools such as bias checklists or guidelines, peer review , and reflexivity (reflecting on one's own biases) can be instrumental in identifying bias.
How do you identify research bias?
Identifying research bias often involves careful scrutiny of the research methodology and the researcher's interpretations. Was the sample of participants relevant to the research question ? Were the interview or survey questions leading? Were there any conflicts of interest that could have influenced the results? It also requires an understanding of the different types of bias and how they might manifest in a research context. Does the bias occur in the data collection process or when the researcher is analyzing data?
Research transparency requires a careful accounting of how the study was designed, conducted, and analyzed. In qualitative research involving human subjects, the researcher is responsible for documenting the characteristics of the research population and research context. With respect to research methods, the procedures and instruments used to collect and analyze data are described in as much detail as possible.
While describing study methodologies and research participants in painstaking detail may sound cumbersome, a clear and detailed description of the research design is necessary for good research. Without this level of detail, it is difficult for your research audience to identify whether bias exists, where bias occurs, and to what extent it may threaten the credibility of your findings.
How to recognize bias in a study?
Recognizing bias in a study requires a critical approach. The researcher should question every step of the research process: Was the sample of participants selected with care? Did the data collection methods encourage open and sincere responses? Did personal beliefs or expectations influence the interpretation of the results? External peer reviews can also be helpful in recognizing bias, as others might spot potential issues that the original researcher missed.
The subsequent sections of this chapter will delve into the impacts of research bias and strategies to avoid it. Through these discussions, researchers will be better equipped to handle bias in their work and contribute to building more credible knowledge.
Unconscious biases, also known as implicit biases, are attitudes or stereotypes that influence our understanding, actions, and decisions in an unconscious manner. These biases can inadvertently infiltrate the research process, skewing the results and conclusions. This section aims to delve deeper into understanding unconscious bias, its impact on research, and strategies to mitigate it.
What is unconscious bias?
Unconscious bias refers to prejudices or social stereotypes about certain groups that individuals form outside their conscious awareness. Everyone holds unconscious beliefs about various social and identity groups, and these biases stem from a tendency to organize social worlds into categories.
How does unconscious bias infiltrate research?
Unconscious bias can infiltrate research in several ways. It can affect how researchers formulate their research questions or hypotheses , how they interact with participants, their data collection methods, and how they interpret their data . For instance, a researcher might unknowingly favor participants who share similar characteristics with them, which could lead to biased results.
Implications of unconscious bias
The implications of unconscious research bias are far-reaching. It can compromise the validity of research findings , influence the choice of research topics, and affect peer review processes . Unconscious bias can also lead to a lack of diversity in research, which can severely limit the value and impact of the findings.
Strategies to mitigate unconscious research bias
While it's challenging to completely eliminate unconscious bias, several strategies can help mitigate its impact. These include being aware of potential unconscious biases, practicing reflexivity , seeking diverse perspectives for your study, and engaging in regular bias-checking activities, such as bias training and peer debriefing .
By understanding and acknowledging unconscious bias, researchers can take steps to limit its impact on their work, leading to more robust findings.
Why is researcher bias an issue?
Research bias is a pervasive issue that researchers must diligently consider and address. It can significantly impact the credibility of findings. Here, we break down the ramifications of bias into two key areas.
How bias affects validity
Research validity refers to the accuracy of the study findings, or the coherence between the researcher’s findings and the participants’ actual experiences. When bias sneaks into a study, it can distort findings and move them further away from the realities that were shared by the research participants . For example, if a researcher's personal beliefs influence their interpretation of data , the resulting conclusions may not reflect what the data show or what participants experienced.
The transferability problem
Transferability is the extent to which your study's findings can be applied beyond the specific context or sample studied. Applying knowledge from one context to a different context is how we can progress and make informed decisions. In quantitative research , the generalizability of a study is a key component that shapes the potential impact of the findings. In qualitative research , all data and knowledge that is produced is understood to be embedded within a particular context, so the notion of generalizability takes on a slightly different meaning. Rather than assuming that the study participants are statistically representative of the entire population, qualitative researchers can reflect on which aspects of their research context bear the most weight on their findings and how these findings may be transferable to other contexts that share key similarities.
How does bias affect research?
Research bias, if not identified and mitigated, can significantly impact research outcomes. The ripple effects of research bias extend beyond individual studies, impacting the body of knowledge in a field and influencing policy and practice. Here, we delve into three specific ways bias can affect research.
Distortion of research results
Bias can lead to a distortion of your study's findings. For instance, confirmation bias can cause a researcher to focus on data that supports their interpretation while disregarding data that contradicts it. This can skew the results and create a misleading picture of the phenomenon under study.
Undermining scientific progress
When research is influenced by bias, it not only misrepresents participants’ realities but can also impede scientific progress. Biased studies can lead researchers down the wrong path, resulting in wasted resources and efforts. Moreover, it could contribute to a body of literature that is skewed or inaccurate, misleading future research and theories.
Influencing policy and practice based on flawed findings
Research often informs policy and practice. If the research is biased, it can lead to the creation of policies or practices that are ineffective or even harmful. For example, a study with selection bias might conclude that a certain intervention is effective, leading to its broad implementation. However, suppose the transferability of the study's findings was not carefully considered. In that case, it may be risky to assume that the intervention will work as well in different populations, which could lead to ineffective or inequitable outcomes.
While it's almost impossible to eliminate bias in research entirely, it's crucial to mitigate its impact as much as possible. By employing thoughtful strategies at every stage of research, we can strive towards rigor and transparency , enhancing the quality of our findings. This section will delve into specific strategies for avoiding bias.
How do you know if your research is biased?
Determining whether your research is biased involves a careful review of your research design, data collection , analysis , and interpretation . It might require you to reflect critically on your own biases and expectations and how these might have influenced your research. External peer reviews can also be instrumental in spotting potential bias.
Strategies to mitigate bias
Minimizing bias involves careful planning and execution at all stages of a research study. These strategies could include formulating clear, unbiased research questions , ensuring that your sample meaningfully represents the research problem you are studying, crafting unbiased data collection instruments, and employing systematic data analysis techniques. Transparency and reflexivity throughout the process can also help minimize bias.
Mitigating bias in data collection
To mitigate bias in data collection, ensure your questions are clear, neutral, and not leading. Triangulation, or using multiple methods or data sources, can also help to reduce bias and increase the credibility of your findings.
Mitigating bias in data analysis
During data analysis , maintaining a high level of rigor is crucial. This might involve using systematic coding schemes in qualitative research or appropriate statistical tests in quantitative research . Regularly questioning your interpretations and considering alternative explanations can help reduce bias. Peer debriefing , where you discuss your analysis and interpretations with colleagues, can also be a valuable strategy.
By using these strategies, researchers can significantly reduce the impact of bias on their research, enhancing the quality and credibility of their findings and contributing to a more robust and meaningful body of knowledge.
Impact of cultural bias in research
Cultural bias is the tendency to interpret and judge phenomena by standards inherent to one's own culture. Given the increasingly multicultural and global nature of research, understanding and addressing cultural bias is paramount. This section will explore the concept of cultural bias, its impacts on research, and strategies to mitigate it.
What is cultural bias in research?
Cultural bias refers to the potential for a researcher's cultural background, experiences, and values to influence the research process and findings. This can occur consciously or unconsciously and can lead to misinterpretation of data, unfair representation of cultures, and biased conclusions.
How does cultural bias infiltrate research?
Cultural bias can infiltrate research at various stages. It can affect the framing of research questions , the design of the study, the methods of data collection , and the interpretation of results . For instance, a researcher might unintentionally design a study that does not consider the cultural context of the participants, leading to a biased understanding of the phenomenon being studied.
Implications of cultural bias
The implications of cultural bias are profound. Cultural bias can skew your findings, limit the transferability of results, and contribute to cultural misunderstandings and stereotypes. This can ultimately lead to inaccurate or ethnocentric conclusions, further perpetuating cultural bias and inequities.
As a result, many social science fields like sociology and anthropology have been critiqued for cultural biases in research. Some of the earliest research inquiries in anthropology, for example, have had the potential to reduce entire cultures to simplistic stereotypes when compared to mainstream norms. A contemporary researcher respecting ethical and cultural boundaries, on the other hand, should seek to properly place their understanding of social and cultural practices in sufficient context without inappropriately characterizing them.
Strategies to mitigate cultural bias
Mitigating cultural bias requires a concerted effort throughout the research study. These efforts could include educating oneself about other cultures, being aware of one's own cultural biases, incorporating culturally diverse perspectives into the research process, and being sensitive and respectful of cultural differences. It might also involve including team members with diverse cultural backgrounds or seeking external cultural consultants to challenge assumptions and provide alternative perspectives.
By acknowledging and addressing cultural bias, researchers can contribute to more culturally competent, equitable, and valid research. This not only enriches the scientific body of knowledge but also promotes cultural understanding and respect.
Ready to jumpstart your research with ATLAS.ti?
Conceptualize your research project with our intuitive data analysis interface. Download a free trial today.
Keep in mind that bias is a force to be mitigated, not a phenomenon that can be eliminated altogether, and the subjectivities of each person are what make our world so complex and interesting. As things are continuously changing and adapting, research knowledge is also continuously being updated as we further develop our understanding of the world around us.
Ready to analyze your data with ATLAS.ti?
See how our intuitive software can draw key insights from your data with a free trial today.
University Libraries
- Research Guides
- Blackboard Learn
- Interlibrary Loan
- Study Rooms
- University of Arkansas
Confronting Bias
Bias in research.
- Bias in Media
- Bias in Search Tools
- Discovering Your Own Biases
- Reducing Bias in Your Writing
- Writing Tips
Understanding research bias is important for several reasons: first, bias exists in all research, across research designs and is difficult to eliminate; second, bias can occur at each stage of the research process; third, bias impacts on the validity and reliability of study findings and misinterpretation of data can have important consequences for practice. The controversial study that suggested a link between the measles-mumps-rubella vaccine and autism in children 2 resulted in a rare retraction of the published study because of media reports that highlighted significant bias in the research process. 3 Bias occurred on several levels: the process of selecting participants was misrepresented; the sample size was too small to infer any firm conclusion from the data analysis and the results were overstated which suggested caution against widespread vaccination and an urgent need for further research. However, in the time between the original publication, and later research refuting the original findings, the uptake of measles-mumps-rubella vaccine in Britain declined, resulting in a 25-fold increases in measles in the 10-year period following the original publication.
Design Bias
Researchers may engage in poorly designed research, which could increase the likelihood of bias. Poor research design may occur when the research questions and aims are not aligned with the research methods, or when researchers choose a biased research question.
Selection or Participant Bias
Research which relies on recruiting or selecting participants may results in selection or participant bias in a number of ways. For instance, participant recruitment might unintentionally target or exclude a specific population, or researchers may not appropriately account for participant withdrawal.
Analysis Bias
Researchers may unknowingly bias their results during data analysis by looking for or focusing on results that support their hypotheses or personal beliefs.
Publication Bias
Not all research articles are published. Publication or reporting bias occurs when publishers are more likely to publish articles showing positive results or statistically significant findings. Research showing negative results may be equally important to the contribution of knowledge in the field but may be less likely to be published.
Conflict of Interest
Bias in research may occur when researchers have a conflict of interest, a person interest that conflicts with their professional obligation. Researchers should always be transparent in disclosing how their work was funded and what, if any, conflicts of interest exist.
This content inspired and informed by the following resources: Smith J, Noble H. Bias in research. Evidence-Based Nursing 2014;17:100-101. ; Research Bias ; Academic Integrity: Avoiding Plagiarism and Understanding Research Ethics: Research Ethics (University of Pittsburgh Libraries)
Words to Know
Expectancy Effect -- A particular type of experimenter effect in which the expectations of the experimenter as to the likely outcome of the experiment acts as a self-fulfilling prophecy, biasing the results in the direction of the expectation. Experimenter Effect -- A biasing effect on the results of an experiment caused by expectations or preconceptions on the part of the experimenter. Also called experimenter bias. Response Bias -- In psychometrics, any systematic tendency of a respondent to choose a particular response category in a multiple-choice questionnaire for an extraneous reason, unrelated to the variable that the response is supposed to indicate but related to the content or meaning of the question.
Definitions from Colman, A.(2015).A Dictionary of Psychology.: Oxford University Press.
Additional Reading
- Bias in research Smith J, Noble H. Bias in research. Evidence-Based Nursing 2014;17:100-101.
- Bias in Research Simundić, A. (2013). Bias in research. Biochemia Medica, 23(1), 12. doi:10.11613/BM.2013.003
- Big Pharma Entanglement with Biomedical Science in James, Jack. The Health of Populations : Beyond Medicine, Elsevier Science & Technology, 2015.
- How to Limit Bias in Experimental Research in Experimental Research Methods in Orthopedics and Trauma, edited by Hamish Simpson, and Peter Augat, Thieme Medical Publishers, Incorporated, 2015.
- Sources of method bias in social science research and recommendations on how to control it Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63(1), 539-569. doi:10.1146/annurev-psych-120710-100452
- << Previous: Bias in Media
- Next: Bias in Search Tools >>
- Last Updated: May 21, 2024 3:25 PM
- URL: https://uark.libguides.com/bias
- See us on Instagram
- Follow us on Twitter
- Phone: 479-575-4104
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 11 December 2020
Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences
- Alec P. Christie ORCID: orcid.org/0000-0002-8465-8410 1 ,
- David Abecasis ORCID: orcid.org/0000-0002-9802-8153 2 ,
- Mehdi Adjeroud 3 ,
- Juan C. Alonso ORCID: orcid.org/0000-0003-0450-7434 4 ,
- Tatsuya Amano ORCID: orcid.org/0000-0001-6576-3410 5 ,
- Alvaro Anton ORCID: orcid.org/0000-0003-4108-6122 6 ,
- Barry P. Baldigo ORCID: orcid.org/0000-0002-9862-9119 7 ,
- Rafael Barrientos ORCID: orcid.org/0000-0002-1677-3214 8 ,
- Jake E. Bicknell ORCID: orcid.org/0000-0001-6831-627X 9 ,
- Deborah A. Buhl 10 ,
- Just Cebrian ORCID: orcid.org/0000-0002-9916-8430 11 ,
- Ricardo S. Ceia ORCID: orcid.org/0000-0001-7078-0178 12 , 13 ,
- Luciana Cibils-Martina ORCID: orcid.org/0000-0002-2101-4095 14 , 15 ,
- Sarah Clarke 16 ,
- Joachim Claudet ORCID: orcid.org/0000-0001-6295-1061 17 ,
- Michael D. Craig 18 , 19 ,
- Dominique Davoult 20 ,
- Annelies De Backer ORCID: orcid.org/0000-0001-9129-9009 21 ,
- Mary K. Donovan ORCID: orcid.org/0000-0001-6855-0197 22 , 23 ,
- Tyler D. Eddy 24 , 25 , 26 ,
- Filipe M. França ORCID: orcid.org/0000-0003-3827-1917 27 ,
- Jonathan P. A. Gardner ORCID: orcid.org/0000-0002-6943-2413 26 ,
- Bradley P. Harris 28 ,
- Ari Huusko 29 ,
- Ian L. Jones 30 ,
- Brendan P. Kelaher 31 ,
- Janne S. Kotiaho ORCID: orcid.org/0000-0002-4732-784X 32 , 33 ,
- Adrià López-Baucells ORCID: orcid.org/0000-0001-8446-0108 34 , 35 , 36 ,
- Heather L. Major ORCID: orcid.org/0000-0002-7265-1289 37 ,
- Aki Mäki-Petäys 38 , 39 ,
- Beatriz Martín 40 , 41 ,
- Carlos A. Martín 8 ,
- Philip A. Martin 1 , 42 ,
- Daniel Mateos-Molina ORCID: orcid.org/0000-0002-9383-0593 43 ,
- Robert A. McConnaughey ORCID: orcid.org/0000-0002-8537-3695 44 ,
- Michele Meroni 45 ,
- Christoph F. J. Meyer ORCID: orcid.org/0000-0001-9958-8913 34 , 35 , 46 ,
- Kade Mills 47 ,
- Monica Montefalcone 48 ,
- Norbertas Noreika ORCID: orcid.org/0000-0002-3853-7677 49 , 50 ,
- Carlos Palacín 4 ,
- Anjali Pande 26 , 51 , 52 ,
- C. Roland Pitcher ORCID: orcid.org/0000-0003-2075-4347 53 ,
- Carlos Ponce 54 ,
- Matt Rinella 55 ,
- Ricardo Rocha ORCID: orcid.org/0000-0003-2757-7347 34 , 35 , 56 ,
- María C. Ruiz-Delgado 57 ,
- Juan J. Schmitter-Soto ORCID: orcid.org/0000-0003-4736-8382 58 ,
- Jill A. Shaffer ORCID: orcid.org/0000-0003-3172-0708 10 ,
- Shailesh Sharma ORCID: orcid.org/0000-0002-7918-4070 59 ,
- Anna A. Sher ORCID: orcid.org/0000-0002-6433-9746 60 ,
- Doriane Stagnol 20 ,
- Thomas R. Stanley 61 ,
- Kevin D. E. Stokesbury 62 ,
- Aurora Torres 63 , 64 ,
- Oliver Tully 16 ,
- Teppo Vehanen ORCID: orcid.org/0000-0003-3441-6787 65 ,
- Corinne Watts 66 ,
- Qingyuan Zhao 67 &
- William J. Sutherland 1 , 42
Nature Communications volume 11 , Article number: 6377 ( 2020 ) Cite this article
16k Accesses
58 Citations
69 Altmetric
Metrics details
- Environmental impact
- Scientific community
- Social sciences
Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many different study designs that vary in their risk of bias to evaluate the true effect of interventions or impacts. Here, we empirically quantify, on a large scale, the prevalence of different study designs and the magnitude of bias in their estimates. Randomised designs and controlled observational designs with pre-intervention sampling were used by just 23% of intervention studies in biodiversity conservation, and 36% of intervention studies in social science. We demonstrate, through pairwise within-study comparisons across 49 environmental datasets, that these types of designs usually give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.
Similar content being viewed by others
Citizen science in environmental and ecological sciences
Improving quantitative synthesis to achieve generality in ecology
Empirical evidence of widespread exaggeration bias and selective reporting in ecology
Introduction.
The ability of science to reliably guide evidence-based decision-making hinges on the accuracy and credibility of studies and their results 1 , 2 . Well-designed, randomised experiments are widely accepted to yield more credible results than non-randomised, ‘observational studies’ that attempt to approximate and mimic randomised experiments 3 . Randomisation is a key element of study design that is widely used across many disciplines because of its ability to remove confounding biases (through random assignment of the treatment or impact of interest 4 , 5 ). However, ethical, logistical, and economic constraints often prevent the implementation of randomised experiments, whereas non-randomised observational studies have become popular as they take advantage of historical data for new research questions, larger sample sizes, less costly implementation, and more relevant and representative study systems or populations 6 , 7 , 8 , 9 . Observational studies nevertheless face the challenge of accounting for confounding biases without randomisation, which has led to innovations in study design.
We define ‘study design’ as an organised way of collecting data. Importantly, we distinguish between data collection and statistical analysis (as opposed to other authors 10 ) because of the belief that bias introduced by a flawed design is often much more important than bias introduced by statistical analyses. This was emphasised by Light, Singer & Willet 11 (p. 5): “You can’t fix by analysis what you bungled by design…”; and Rubin 3 : “Design trumps analysis.” Nevertheless, the importance of study design has often been overlooked in debates over the inability of researchers to reproduce the original results of published studies (so-called ‘reproducibility crises’ 12 , 13 ) in favour of other issues (e.g., p-hacking 14 and Hypothesizing After Results are Known or ‘HARKing’ 15 ).
To demonstrate the importance of study designs, we can use the following decomposition of estimation error equation 16 :
This demonstrates that even if we improve the quality of modelling and analysis (to reduce modelling bias through a better bias-variance trade-off 17 ) or increase sample size (to reduce statistical noise), we cannot remove the intrinsic bias introduced by the choice of study design (design bias) unless we collect the data in a different way. The importance of study design in determining the levels of bias in study results therefore cannot be overstated.
For the purposes of this study we consider six commonly used study designs; differences and connections can be visualised in Fig. 1 . There are three major components that allow us to define these designs: randomisation, sampling before and after the impact of interest occurs, and the use of a control group.
A hypothetical study set-up is shown where the abundance of birds in three impact and control replicates (e.g., fields represented by blocks in a row) are monitored before and after an impact (e.g., ploughing) that occurs in year zero. Different colours represent each study design and illustrate how replicates are sampled. Approaches for calculating an estimate of the true effect of the impact for each design are also shown, along with synonyms from different disciplines.
Of the non-randomised observational designs, the Before-After Control-Impact (BACI) design uses a control group and samples before and after the impact occurs (i.e., in the ‘before-period’ and the ‘after-period’). Its rationale is to explicitly account for pre-existing differences between the impact group (exposed to the impact) and control group in the before-period, which might otherwise bias the estimate of the impact’s true effect 6 , 18 , 19 .
The BACI design improves upon several other commonly used observational study designs, of which there are two uncontrolled designs: After, and Before-After (BA). An After design monitors an impact group in the after-period, while a BA design compares the state of the impact group between the before- and after-periods. Both designs can be expected to yield poor estimates of the impact’s true effect (large design bias; Equation (1)) because changes in the response variable could have occurred without the impact (e.g., due to natural seasonal changes; Fig. 1 ).
The other observational design is Control-Impact (CI), which compares the impact group and control group in the after-period (Fig. 1 ). This design may suffer from design bias introduced by pre-existing differences between the impact group and control group in the before-period; bias that the BACI design was developed to account for 20 , 21 . These differences have many possible sources, including experimenter bias, logistical and environmental constraints, and various confounding factors (variables that change the propensity of receiving the impact), but can be adjusted for through certain data pre-processing techniques such as matching and stratification 22 .
Among the randomised designs, the most commonly used are counterparts to the observational CI and BACI designs: Randomised Control-Impact (R-CI) and Randomised Before-After Control-Impact (R-BACI) designs. The R-CI design, often termed ‘Randomised Controlled Trials’ (RCTs) in medicine and hailed as the ‘gold standard’ 23 , 24 , removes any pre-impact differences in a stochastic sense, resulting in zero design bias (Equation ( 1 )). Similarly, the R-BACI design should also have zero design bias, and the impact group measurements in the before-period could be used to improve the efficiency of the statistical estimator. No randomised equivalents exist of After or BA designs as they are uncontrolled.
It is important to briefly note that there is debate over two major statistical methods that can be used to analyse data collected using BACI and R-BACI designs, and which is superior at reducing modelling bias 25 (Equation (1)). These statistical methods are: (i) Differences in Differences (DiD) estimator; and (ii) covariance adjustment using the before-period response, which is an extension of Analysis of Covariance (ANCOVA) for generalised linear models — herein termed ‘covariance adjustment’ (Fig. 1 ). These estimators rely on different assumptions to obtain unbiased estimates of the impact’s true effect. The DiD estimator assumes that the control group response accurately represents the impact group response had it not been exposed to the impact (‘parallel trends’ 18 , 26 ) whereas covariance adjustment assumes there are no unmeasured confounders and linear model assumptions hold 6 , 27 .
From both theory and Equation (1), with similar sample sizes, randomised designs (R-BACI and R-CI) are expected to be less biased than controlled, observational designs with sampling in the before-period (BACI), which in turn should be superior to observational designs without sampling in the before-period (CI) or without a control group (BA and After designs 7 , 28 ). Between randomised designs, we might expect that an R-BACI design performs better than a R-CI design because utilising extra data before the impact may improve the efficiency of the statistical estimator by explicitly characterising pre-existing differences between the impact group and control group.
Given the likely differences in bias associated with different study designs, concerns have been raised over the use of poorly designed studies in several scientific disciplines 7 , 29 , 30 , 31 , 32 , 33 , 34 , 35 . Some disciplines, such as the social and medical sciences, commonly undertake direct comparisons of results obtained by randomised and non-randomised designs within a single study 36 , 37 , 38 or between multiple studies (between-study comparisons 39 , 40 , 41 ) to specifically understand the influence of study designs on research findings. However, within-study comparisons are limited in their scope (e.g., a single study 42 , 43 ) and between-study comparisons can be confounded by variability in context or study populations 44 . Overall, we lack quantitative estimates of the prevalence of different study designs and the levels of bias associated with their results.
In this work, we aim to first quantify the prevalence of different study designs in the social and environmental sciences. To fill this knowledge gap, we take advantage of summaries for several thousand biodiversity conservation intervention studies in the Conservation Evidence database 45 ( www.conservationevidence.com ) and social intervention studies in systematic reviews by the Campbell Collaboration ( www.campbellcollaboration.org ). We then quantify the levels of bias in estimates obtained by different study designs (R-BACI, R-CI, BACI, BA, and CI) by applying a hierarchical model to approximately 1000 within-study comparisons across 49 raw environmental datasets from a range of fields. We show that R-BACI, R-CI and BACI designs are poorly represented in studies testing biodiversity conservation and social interventions, and that these types of designs tend to give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.
Prevalence of study designs
We found that the biodiversity-conservation (conservation evidence) and social-science (Campbell collaboration) literature had similarly high proportions of intervention studies that used CI designs and After designs, but low proportions that used R-BACI, BACI, or BA designs (Fig. 2 ). There were slightly higher proportions of R-CI designs used by intervention studies in social-science systematic reviews than in the biodiversity-conservation literature (Fig. 2 ). The R-BACI, R-CI, and BACI designs made up 23% of intervention studies for biodiversity conservation, and 36% of intervention studies for social science.
Intervention studies from the biodiversity-conservation literature were screened from the Conservation Evidence database ( n =4260 studies) and studies from the social-science literature were screened from 32 Campbell Collaboration systematic reviews ( n =1009 studies – note studies excluded by these reviews based on their study design were still counted). Percentages for the social-science literature were calculated for each systematic review (blue data points) and then averaged across all 32 systematic reviews (blue bars and black vertical lines represent mean and 95% Confidence Intervals, respectively). Percentages for the biodiversity-conservation literature are absolute values (shown as green bars) calculated from the entire Conservation Evidence database (after excluding any reviews). Source data are provided as a Source Data file. BA before-after, CI control-impact, BACI before-after-control-impact, R-BACI randomised BACI, R-CI randomised CI.
Influence of different study designs on study results
In non-randomised datasets, we found that estimates of BACI (with covariance adjustment) and CI designs were very similar, while the point estimates for most other designs often differed substantially in their magnitude and sign. We found similar results in randomised datasets for R-BACI (with covariance adjustment) and R-CI designs. For ~30% of responses, in both non-randomised and randomised datasets, study design estimates differed in their statistical significance (i.e., p < 0.05 versus p > =0.05), except for estimates of (R-)BACI (with covariance adjustment) and (R-)CI designs (Table 1 ; Fig. 3 ). It was rare for the 95% confidence intervals of different designs’ estimates to not overlap – except when comparing estimates of BA designs to (R-)BACI (with covariance adjustment) and (R-)CI designs (Table 1 ). It was even rarer for estimates of different designs to have significantly different signs (i.e., one estimate with entirely negative confidence intervals versus one with entirely positive confidence intervals; Table 1 , Fig. 3 ). Overall, point estimates often differed greatly in their magnitude and, to a lesser extent, in their sign between study designs, but did not differ as greatly when accounting for the uncertainty around point estimates – except in terms of their statistical significance.
t-statistics were obtained from two-sided t-tests of estimates obtained by each design for different responses in each dataset using Generalised Linear Models (see Methods). For randomised datasets, BACI and CI axis labels refer to R-BACI and R-CI designs (denoted by ‘R-’). DiD Difference in Differences; CA covariance adjustment. Lines at t-statistic values of 1.96 denote boundaries between cells and colours of points indicate differences in direction and statistical significance ( p < 0.05; grey = same sign and significance, orange = same sign but difference in significance, red = different sign and significance). Numbers refer to the number of responses in each cell. Source data are provided as a Source Data file. BA Before-After, CI Control-Impact, BACI Before-After-Control-Impact.
Levels of bias in estimates of different study designs
We modelled study design bias using a random effect across datasets in a hierarchical Bayesian model; σ is the standard deviation of the bias term, and assuming bias is randomly distributed across datasets and is on average zero, larger values of σ will indicate a greater magnitude of bias (see Methods). We found that, for randomised datasets, estimates of both R-BACI (using covariance adjustment; CA) and R-CI designs were affected by negligible amounts of bias (very small values of σ; Table 2 ). When the R-BACI design used the DiD estimator, it suffered from slightly more bias (slightly larger values of σ), whereas the BA design had very high bias when applied to randomised datasets (very large values of σ; Table 2 ). There was a highly positive correlation between the estimates of R-BACI (using covariance adjustment) and R-CI designs (Ω[R-BACI CA, R-CI] was close to 1; Table 2 ). Estimates of R-BACI using the DiD estimator were also positively correlated with estimates of R-BACI using covariance adjustment and R-CI designs (moderate positive mean values of Ω[R-BACI CA, R-BACI DiD] and Ω[R-BACI DiD, R-CI]; Table 2 ).
For non-randomised datasets, controlled designs (BACI and CI) were substantially less biased (far smaller values of σ) than the uncontrolled BA design (Table 2 ). A BACI design using the DiD estimator was slightly less biased than the BACI design using covariance adjustment, which was, in turn, slightly less biased than the CI design (Table 2 ).
Standard errors estimated by the hierarchical Bayesian model were reasonably accurate for the randomised datasets (see λ in Methods and Table 2 ), whereas there was some underestimation of standard errors and lack-of-fit for non-randomised datasets.
Our approach provides a principled way to quantify the levels of bias associated with different study designs. We found that randomised study designs (R-BACI and R-CI) and observational BACI designs are poorly represented in the environmental and social sciences; collectively, descriptive case studies (the After design), the uncontrolled, observational BA design, and the controlled, observational CI design made up a substantially greater proportion of intervention studies (Fig. 2 ). And yet R-BACI, R-CI and BACI designs were found to be quantifiably less biased than other observational designs.
As expected the R-CI and R-BACI designs (using a covariance adjustment estimator) performed well; the R-BACI design using a DiD estimator performed slightly less well, probably because the differencing of pre-impact data by this estimator may introduce additional statistical noise compared to covariance adjustment, which controls for these data using a lagged regression variable. Of the observational designs, the BA design performed very poorly (both when analysing randomised and non-randomised data) as expected, being uncontrolled and therefore prone to severe design bias 7 , 28 . The CI design also tended to be more biased than the BACI design (using a DiD estimator) due to pre-existing differences between the impact and control groups. For BACI designs, we recommend that the underlying assumptions of DiD and CA estimators are carefully considered before choosing to apply them to data collected for a specific research question 6 , 27 . Their levels of bias were negligibly different and their known bracketing relationship suggests they will typically give estimates with the same sign, although their tendency to over- or underestimate the true effect will depend on how well the underlying assumptions of each are met (most notably, parallel trends for DiD and no unmeasured confounders for CA; see Introduction) 6 , 27 . Overall, these findings demonstrate the power of large within-study comparisons to directly quantify differences in the levels of bias associated with different designs.
We must acknowledge that the assumptions of our hierarchical model (that the bias for each design (j) is on average zero and normally distributed) cannot be verified without gold standard randomised experiments and that, for observational designs, the model was overdispersed (potentially due to underestimation of statistical error by GLM(M)s or positively correlated design biases). The exact values of our hierarchical model should therefore be treated with appropriate caution, and future research is needed to refine and improve our approach to quantify these biases more precisely. Responses within datasets may also not be independent as multiple species could interact; therefore, the estimates analysed by our hierarchical model are statistically dependent on each other, and although we tried to account for this using a correlation matrix (see Methods, Eq. ( 3 )), this is a limitation of our model. We must also recognise that we collated datasets using non-systematic searches 46 , 47 and therefore our analysis potentially exaggerates the intrinsic biases of observational designs (i.e., our data may disproportionately reflect situations where the BACI design was chosen to account for confounding factors). We nevertheless show that researchers were wise to use the BACI design because it was less biased than CI and BA designs across a wide range of datasets from various environmental systems and locations. Without undertaking costly and time-consuming pre-impact sampling and pilot studies, researchers are also unlikely to know the levels of bias that could affect their results. Finally, we did not consider sample size, but it is likely that researchers might use larger sample sizes for CI and BA designs than BACI designs. This is, however, unlikely to affect our main conclusions because larger sample sizes could increase type I errors (false positive rate) by yielding more precise, but biased estimates of the true effect 28 .
Our analyses provide several empirically supported recommendations for researchers designing future studies to assess an impact of interest. First, using a controlled and/or randomised design (if possible) was shown to strongly reduce the level of bias in study estimates. Second, when observational designs must be used (as randomisation is not feasible or too costly), we urge researchers to choose the BACI design over other observational designs—and when that is not possible, to choose the CI design over the uncontrolled BA design. We acknowledge that limited resources, short funding timescales, and ethical or logistical constraints 48 may force researchers to use the CI design (if randomisation and pre-impact sampling are impossible) or the BA design (if appropriate controls cannot be found 28 ). To facilitate the usage of less biased designs, longer-term investments in research effort and funding are required 43 . Far greater emphasis on study designs in statistical education 49 and better training and collaboration between researchers, practitioners and methodologists, is needed to improve the design of future studies; for example, potentially improving the CI design by pairing or matching the impact group and control group 22 , or improving the BA design using regression discontinuity methods 48 , 50 . Where the choice of study design is limited, researchers must transparently communicate the limitations and uncertainty associated with their results.
Our findings also have wider implications for evidence synthesis, specifically the exclusion of certain observational study designs from syntheses (the ‘rubbish in, rubbish out’ concept 51 , 52 ). We believe that observational designs should be included in systematic reviews and meta-analyses, but that careful adjustments are needed to account for their potential biases. Exclusion of observational studies often results from subjective, checklist-based ‘Risk of Bias’ or quality assessments of studies (e.g., AMSTRAD 2 53 , ROBINS-I 54 , or GRADE 55 ) that are not data-driven and often neglect to identify the actual direction, or quantify the magnitude, of possible bias introduced by observational studies when rating the quality of a review’s recommendations. We also found that there was a small proportion of studies that used randomised designs (R-CI or R-BACI) or observational BACI designs (Fig. 2 ), suggesting that systematic reviews and meta-analyses risk excluding a substantial proportion of the literature and limiting the scope of their recommendations if such exclusion criteria are used 32 , 56 , 57 . This problem is compounded by the fact that, at least in conservation science, studies using randomised or BACI designs are strongly concentrated in Europe, Australasia, and North America 31 . Systematic reviews that rely on these few types of study designs are therefore likely to fail to provide decision makers outside of these regions with locally relevant recommendations that they prefer 58 . The Covid-19 pandemic has highlighted the difficulties in making locally relevant evidence-based decisions using studies conducted in different countries with different demographics and cultures, and on patients of different ages, ethnicities, genetics, and underlying health issues 59 . This problem is also acute for decision-makers working on biodiversity conservation in the tropical regions, where the need for conservation is arguably the greatest (i.e., where most of Earth’s biodiversity exists 60 ) but they either have to rely on very few well-designed studies that are not locally relevant (i.e., have low generalisability), or more studies that are locally relevant but less well-designed 31 , 32 . Either option could lead decision-makers to take ineffective or inefficient decisions. In the long-term, improving the quality and coverage of scientific evidence and evidence syntheses across the world will help solve these issues, but shorter-term solutions to synthesising patchy evidence bases are required.
Our work furthers sorely needed research on how to combine evidence from studies that vary greatly in their design. Our approach is an alternative to conventional meta-analyses which tend to only weight studies by their sample size or the inverse of their variance 61 ; when studies vary greatly in their study design, simply weighting by inverse variance or sample size is unlikely to account for different levels of bias introduced by different study designs (see Equation (1)). For example, a BA study could receive a larger weight if it had lower variance than a BACI study, despite our results suggesting a BA study usually suffers from greater design bias. Our model provides a principled way to weight studies by both their variance and the likely amount of bias introduced by their study design; it is therefore a form of ‘bias-adjusted meta-analysis’ 62 , 63 , 64 , 65 , 66 . However, instead of relying on elicitation of subjective expert opinions on the bias of each study, we provide a data-driven, empirical quantification of study biases – an important step that was called for to improve such meta-analytic approaches 65 , 66 .
Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect, their effect size estimates, standard errors, and the type of study design; 2.) enter these data into our hierarchical model, where effect size estimates share the same intercept (the true causal effect), a random effect term due to design bias (whose variance is estimated by the method we used), and a random effect term for statistical noise (whose variance is estimated by the reported standard error of studies); 3.) fit this model and estimate the shared intercept/true effect. Heuristically, this can be thought of as weighting studies by both their design bias and their sampling variance and could be implemented on a dynamic meta-analysis platform (such as metadataset.com 67 ). This approach has substantial potential to develop evidence synthesis in fields (such as biodiversity conservation 31 , 32 ) with patchy evidence bases, where reliably synthesising findings from studies that vary greatly in their design is a fundamental and unavoidable challenge.
Our study has highlighted an often overlooked aspect of debates over scientific reproducibility: that the credibility of studies is fundamentally determined by study design. Testing the effectiveness of conservation and social interventions is undoubtedly of great importance given the current challenges facing biodiversity and society in general and the serious need for more evidence-based decision-making 1 , 68 . And yet our findings suggest that quantifiably less biased study designs are poorly represented in the environmental and social sciences. Greater methodological training of researchers and funding for intervention studies, as well as stronger collaborations between methodologists and practitioners is needed to facilitate the use of less biased study designs. Better communication and reporting of the uncertainty associated with different study designs is also needed, as well as more meta-research (the study of research itself) to improve standards of study design 69 . Our hierarchical model provides a principled way to combine studies using a variety of study designs that vary greatly in their risk of bias, enabling us to make more efficient use of patchy evidence bases. Ultimately, we hope that researchers and practitioners testing interventions will think carefully about the types of study designs they use, and we encourage the evidence synthesis community to embrace alternative methods for combining evidence from heterogeneous sets of studies to improve our ability to inform evidence-based decision-making in all disciplines.
Quantifying the use of different designs
We compared the use of different study designs in the literature that quantitatively tested interventions between the fields of biodiversity conservation (4,260 studies collated by Conservation Evidence 45 ) and social science (1,009 studies found by 32 systematic reviews produced by the Campbell Collaboration: www.campbellcollaboration.org ).
Conservation Evidence is a database of intervention studies, each of which has quantitatively tested a conservation intervention (e.g., sowing strips of wildflower seeds on farmland to benefit birds), that is continuously being updated through comprehensive, manual searches of conservation journals for a wide range of fields in biodiversity conservation (e.g., amphibian, bird, peatland, and farmland conservation 45 ). To obtain the proportion of studies that used each design from Conservation Evidence, we simply extracted the type of study design from each study in the database in 2019 – the study design was determined using a standardised set of criteria; reviews were not included (Table 3 ). We checked if the designs reported in the database accurately reflected the designs in the original publication and found that for a random subset of 356 studies, 95.1% were accurately described.
Each systematic review produced by the Campbell Collaboration collates and analyses studies that test a specific social intervention; we collated systematic reviews that tested a variety of social interventions across several fields in the social sciences, including education, crime and justice, international development and social welfare (Supplementary Data 1 ). We retrieved systematic reviews produced by the Campbell Collaboration by searching their website ( www.campbellcollaboration.org ) for reviews published between 2013‒2019 (as of 8th September 2019) — we limited the date range as we could not go through every review. As we were interested in the use of study designs in the wider social-science literature, we only considered reviews (32 in total) that contained sufficient information on the number of included and excluded studies that used different study designs. Studies may be excluded from systematic reviews for several reasons, such as their relevance to the scope of the review (e.g., testing a relevant intervention) and their study design. We only considered studies if the sole reason for their exclusion from the systematic review was their study design – i.e., reviews clearly reported that the study was excluded because it used a particular study design, and not because of any other reason, such as its relevance to the review’s research questions. We calculated the proportion of studies that used each design in each systematic review (using the same criteria as for the biodiversity-conservation literature – see Table 3 ) and then averaged these proportions across all systematic reviews.
Within-study comparisons of different study designs
We wanted to make direct within-study comparisons between the estimates obtained by different study designs (e.g., see 38 , 70 , 71 for single within-study comparisons) for many different studies. If a dataset contains data collected using a BACI design, subsets of these data can be used to mimic the use of other study designs (a BA design using only data for the impact group, and a CI design using only data collected after the impact occurred). Similarly, if data were collected using a R-BACI design, subsets of these data can be used to mimic the use of a BA design and a R-CI design. Collecting BACI and R-BACI datasets would therefore allow us to make direct within-study comparisons of the estimates obtained by these designs.
We collated BACI and R-BACI datasets by searching the Web of Science Core Collection 72 which included the following citation indexes: Science Citation Index Expanded (SCI-EXPANDED) 1900-present; Social Sciences Citation Index (SSCI) 1900-present Arts & Humanities Citation Index (A&HCI) 1975-present; Conference Proceedings Citation Index - Science (CPCI-S) 1990-present; Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) 1990-present; Book Citation Index - Science (BKCI-S) 2008-present; Book Citation Index - Social Sciences & Humanities (BKCI-SSH) 2008-present; Emerging Sources Citation Index (ESCI) 2015-present; Current Chemical Reactions (CCR-EXPANDED) 1985-present (Includes Institut National de la Propriete Industrielle structure data back to 1840); Index Chemicus (IC) 1993-present. The following search terms were used: [‘BACI’] OR [‘Before-After Control-Impact’] and the search was conducted on the 18th December 2017. Our search returned 674 results, which we then refined by selecting only ‘Article’ as the document type and using only the following Web of Science Categories: ‘Ecology’, ‘Marine Freshwater Biology’, ‘Biodiversity Conservation’, ‘Fisheries’, ‘Oceanography’, ‘Forestry’, ‘Zoology’, Ornithology’, ‘Biology’, ‘Plant Sciences’, ‘Entomology’, ‘Remote Sensing’, ‘Toxicology’ and ‘Soil Science’. This left 579 results, which we then restricted to articles published since 2002 (15 years prior to search) to give us a realistic opportunity to obtain the raw datasets, thus reducing this number to 542. We were able to access the abstracts of 521 studies and excluded any that did not test the effect of an environmental intervention or threat using an R-BACI or BACI design with response measures related to the abundance (e.g., density, counts, biomass, cover), reproduction (reproductive success) or size (body length, body mass) of animals or plants. Many studies did not test a relevant metric (e.g., they measured species richness), did not use a BACI or R-BACI design, or did not test the effect of an intervention or threat — this left 96 studies for which we contacted all corresponding authors to ask for the raw dataset. We were able to fully access 54 raw datasets, but upon closer inspection we found that three of these datasets either: did not use a BACI design; did not use the metrics we specified; or did not provide sufficient data for our analyses. This left 51 datasets in total that we used in our preliminary analyses (Supplementary Data 2 ).
All the datasets were originally collected to evaluate the effect of an environmental intervention or impact. Most of them contained multiple response variables (e.g., different measures for different species, such as abundance or density for species A, B, and C). Within a dataset, we use the term “response” to refer to the estimation of the true effect of an impact on one response variable. There were 1,968 responses in total across 51 datasets. We then excluded 932 responses (resulting in the exclusion of one dataset) where one or more of the four time-period and treatment subsets (Before Control, Before Impact, After Control, and After Impact data) consisted of entirely zero measurements, or two or more of these subsets had more than 90% zero measurements. We also excluded one further dataset as it was the only one to not contain repeated measurements at sites in both the before- and after-periods. This was necessary to generate reliable standard errors when modelling these data. We modelled the remaining 1,036 responses from across 49 datasets (Supplementary Table 1 ).
We applied each study design to the appropriate components of each dataset using Generalised Linear Models (GLMs 73 , 74 ) because of their generality and ability to implement the statistical estimators of many different study designs. The model structure of GLMs was adjusted for each response in each dataset based on the study design specified, response measure and dataset structure (Supplementary Table 2 ). We quantified the effect of the time period for the BA design (After vs Before the impact) and the effect of the treatment type for the CI and R-CI designs (Impact vs Control) on the response variable (Supplementary Table 2 ). For BACI and R-BACI designs, we implemented two statistical estimators: 1.) a DiD estimator that estimated the true effect using an interaction term between time and treatment type; and 2.) a covariance adjustment estimator that estimated the true effect using a term for the treatment type with a lagged variable (Supplementary Table 2 ).
As there were large numbers of responses, we used general a priori rules to specify models for each response; this may have led to some model misspecification, but was unlikely to have substantially affected our pairwise comparison of estimates obtained by different designs. The error family of each GLM was specified based on the nature of the measure used and preliminary data exploration: count measures (e.g., abundance) = poisson; density measures (e.g., biomass or abundance per unit area) = quasipoisson, as data for these measures tended to be overdispersed; percentage measures (e.g., percentage cover) = quasibinomial; and size measures (e.g., body length) = gaussian.
We treated each year or season in which data were collected as independent observations because the implementation of a seasonal term in models is likely to vary on a case-by-case basis; this will depend on the research questions posed by each study and was not feasible for us to consider given the large number of responses we were modelling. The log link function was used for all models to generate a standardised log response ratio as an estimate of the true effect for each response; a fixed effect coefficient (a variable named treatment status; Supplementary Table 2 ) was used to estimate the log response ratio 61 . If the response had at least ten ‘sites’ (independent sampling units) and two measurements per site on average, we used the random effects of subsample (replicates within a site) nested within site to capture the dependence within a site and subsample (i.e., a Generalised Linear Mixed Model or GLMM 73 , 74 was implemented instead of a GLM); otherwise we fitted a GLM with only the fixed effects (Supplementary Table 2 ).
We fitted all models using R version 3.5.1 75 , and packages lme4 76 and MASS 77 . Code to replicate all analyses is available (see Data and Code Availability). We compared the estimates obtained using each study design (both in terms of point estimates and estimates with associated standard error) by their magnitude and sign.
A model-based quantification of the bias in study design estimates
We used a hierarchical Bayesian model motivated by the decomposition in Equation (1) to quantify the bias in different study design estimates. This model takes the estimated effects of impacts and their standard errors as inputs. Let \(\hat \beta _{ij}\) be the true effect estimator in study \(i\) using design \(j\) and \(\hat \sigma _{ij}\) be its estimated standard error from the corresponding GLM or GLMM. Our hierarchical model assumes:
where β i is the true effect for response \(i\) , \(\gamma _{ij}\) is the bias of design j in response \(i\) , and \(\varepsilon _{ij}\) is the sampling noise of the statistical estimator. Although \(\gamma _{ij}\) technically incorporates both the design bias and any misspecification (modelling) bias due to using GLMs or GLMMs (Equation (1)), we expect the modelling bias to be much smaller than the design bias 3 , 11 . We assume the statistical errors \(\varepsilon _i\) within a response are related to the estimated standard errors through the following joint distribution:
where \({\Omega}\) is the correlation matrix for the different estimators in the same response and λ is a scaling factor to account for possible over/under-estimation of the standard errors.
This model effectively quantifies the bias of design \(j\) using the value of \(\sigma _j\) (larger values = more bias) by accounting for within-response correlations using the correlation matrix \({\Omega}\) and for possible under-estimation of the standard error using \(\lambda\) . We ensured that the prior distributions we used had very large variances so they would have a very small effect on the posterior distribution — accordingly we placed the following disperse priors on the variance parameters:
We fitted the hierarchical Bayesian model in R version 3.5.1 using the Bayesian inference package rstan 78 .
Data availability
All data analysed in the current study are available from Zenodo, https://doi.org/10.5281/zenodo.3560856 . Source data are provided with this paper.
Code availability
All code used in the current study is available from Zenodo, https://doi.org/10.5281/zenodo.3560856 .
Donnelly, C. A. et al. Four principles to make evidence synthesis more useful for policy. Nature 558 , 361–364 (2018).
Article ADS CAS PubMed Google Scholar
McKinnon, M. C., Cheng, S. H., Garside, R., Masuda, Y. J. & Miller, D. C. Sustainability: map the evidence. Nature 528 , 185–187 (2015).
Rubin, D. B. For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 , 808–840 (2008).
Article MathSciNet MATH Google Scholar
Peirce, C. S. & Jastrow, J. On small differences in sensation. Mem. Natl Acad. Sci. 3 , 73–83 (1884).
Fisher, R. A. Statistical methods for research workers . (Oliver and Boyd, 1925).
Angrist, J. D. & Pischke, J.-S. Mostly harmless econometrics: an empiricist’s companion . (Princeton University Press, 2008).
de Palma, A. et al . Challenges with inferring how land-use affects terrestrial biodiversity: study design, time, space and synthesis. in Next Generation Biomonitoring: Part 1 163–199 (Elsevier Ltd., 2018).
Sagarin, R. & Pauchard, A. Observational approaches in ecology open new ground in a changing world. Front. Ecol. Environ. 8 , 379–386 (2010).
Article Google Scholar
Shadish, W. R., Cook, T. D. & Campbell, D. T. Experimental and quasi-experimental designs for generalized causal inference . (Houghton Mifflin, 2002).
Rosenbaum, P. R. Design of observational studies . vol. 10 (Springer, 2010).
Light, R. J., Singer, J. D. & Willett, J. B. By design: Planning research on higher education. By design: Planning research on higher education . (Harvard University Press, 1990).
Ioannidis, J. P. A. Why most published research findings are false. PLOS Med. 2 , e124 (2005).
Article PubMed PubMed Central Google Scholar
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).
Article CAS Google Scholar
John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 , 524–532 (2012).
Article PubMed Google Scholar
Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).
Zhao, Q., Keele, L. J. & Small, D. S. Comment: will competition-winning methods for causal inference also succeed in practice? Stat. Sci. 34 , 72–76 (2019).
Article MATH Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning . vol. 1 (Springer series in statistics, 2001).
Underwood, A. J. Beyond BACI: experimental designs for detecting human environmental impacts on temporal variations in natural populations. Mar. Freshw. Res. 42 , 569–587 (1991).
Stewart-Oaten, A. & Bence, J. R. Temporal and spatial variation in environmental impact assessment. Ecol. Monogr. 71 , 305–339 (2001).
Eddy, T. D., Pande, A. & Gardner, J. P. A. Massive differential site-specific and species-specific responses of temperate reef fishes to marine reserve protection. Glob. Ecol. Conserv. 1 , 13–26 (2014).
Sher, A. A. et al. Native species recovery after reduction of an invasive tree by biological control with and without active removal. Ecol. Eng. 111 , 167–175 (2018).
Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences . (Cambridge University Press, 2015).
Greenhalgh, T. How to read a paper: the basics of Evidence Based Medicine . (John Wiley & Sons, Ltd, 2019).
Salmond, S. S. Randomized Controlled Trials: Methodological Concepts and Critique. Orthopaedic Nursing 27 , (2008).
Geijzendorffer, I. R. et al. How can global conventions for biodiversity and ecosystem services guide local conservation actions? Curr. Opin. Environ. Sustainability 29 , 145–150 (2017).
Dimick, J. B. & Ryan, A. M. Methods for evaluating changes in health care policy. JAMA 312 , 2401 (2014).
Article CAS PubMed Google Scholar
Ding, P. & Li, F. A bracketing relationship between difference-in-differences and lagged-dependent-variable adjustment. Political Anal. 27 , 605–615 (2019).
Christie, A. P. et al. Simple study designs in ecology produce inaccurate estimates of biodiversity responses. J. Appl. Ecol. 56 , 2742–2754 (2019).
Watson, M. et al. An analysis of the quality of experimental design and reliability of results in tribology research. Wear 426–427 , 1712–1718 (2019).
Kilkenny, C. et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE 4 , e7824 (2009).
Christie, A. P. et al. The challenge of biased evidence in conservation. Conserv, Biol . 13577, https://doi.org/10.1111/cobi.13577 (2020).
Christie, A. P. et al. Poor availability of context-specific evidence hampers decision-making in conservation. Biol. Conserv. 248 , 108666 (2020).
Moscoe, E., Bor, J. & Bärnighausen, T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: a review of current and best practice. J. Clin. Epidemiol. 68 , 132–143 (2015).
Goldenhar, L. M. & Schulte, P. A. Intervention research in occupational health and safety. J. Occup. Med. 36 , 763–778 (1994).
CAS PubMed Google Scholar
Junker, J. et al. A severe lack of evidence limits effective conservation of the World’s primates. BioScience https://doi.org/10.1093/biosci/biaa082 (2020).
Altindag, O., Joyce, T. J. & Reeder, J. A. Can Nonexperimental Methods Provide Unbiased Estimates of a Breastfeeding Intervention? A Within-Study Comparison of Peer Counseling in Oregon. Evaluation Rev. 43 , 152–188 (2019).
Chaplin, D. D. et al. The Internal And External Validity Of The Regression Discontinuity Design: A Meta-Analysis Of 15 Within-Study Comparisons. J. Policy Anal. Manag. 37 , 403–429 (2018).
Cook, T. D., Shadish, W. R. & Wong, V. C. Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. J. Policy Anal. Manag. 27 , 724–750 (2008).
Ioannidis, J. P. A. et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. J. Am. Med. Assoc. 286 , 821–830 (2001).
dos Santos Ribas, L. G., Pressey, R. L., Loyola, R. & Bini, L. M. A global comparative analysis of impact evaluation methods in estimating the effectiveness of protected areas. Biol. Conserv. 246 , 108595 (2020).
Benson, K. & Hartz, A. J. A Comparison of Observational Studies and Randomized, Controlled Trials. N. Engl. J. Med. 342 , 1878–1886 (2000).
Smokorowski, K. E. et al. Cautions on using the Before-After-Control-Impact design in environmental effects monitoring programs. Facets 2 , 212–232 (2017).
França, F. et al. Do space-for-time assessments underestimate the impacts of logging on tropical biodiversity? An Amazonian case study using dung beetles. J. Appl. Ecol. 53 , 1098–1105 (2016).
Duvendack, M., Hombrados, J. G., Palmer-Jones, R. & Waddington, H. Assessing ‘what works’ in international development: meta-analysis for sophisticated dummies. J. Dev. Effectiveness 4 , 456–471 (2012).
Sutherland, W. J. et al. Building a tool to overcome barriers in research-implementation spaces: The Conservation Evidence database. Biol. Conserv. 238 , 108199 (2019).
Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).
Konno, K. & Pullin, A. S. Assessing the risk of bias in choice of search sources for environmental meta‐analyses. Res. Synth. Methods 11 , 698–713 (2020).
PubMed Google Scholar
Butsic, V., Lewis, D. J., Radeloff, V. C., Baumann, M. & Kuemmerle, T. Quasi-experimental methods enable stronger inferences from observational data in ecology. Basic Appl. Ecol. 19 , 1–10 (2017).
Brownstein, N. C., Louis, T. A., O’Hagan, A. & Pendergast, J. The role of expert judgment in statistical inference and evidence-based decision-making. Am. Statistician 73 , 56–68 (2019).
Article MathSciNet Google Scholar
Hahn, J., Todd, P. & Klaauw, W. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica 69 , 201–209 (2001).
Slavin, R. E. Best evidence synthesis: an intelligent alternative to meta-analysis. J. Clin. Epidemiol. 48 , 9–18 (1995).
Slavin, R. E. Best-evidence synthesis: an alternative to meta-analytic and traditional reviews. Educ. Researcher 15 , 5–11 (1986).
Shea, B. J. et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Online) 358 , 1–8 (2017).
Google Scholar
Sterne, J. A. C. et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355 , i4919 (2016).
Guyatt, G. et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J. Clin. Epidemiol. 66 , 151–157 (2013).
Davies, G. M. & Gray, A. Don’t let spurious accusations of pseudoreplication limit our ability to learn from natural experiments (and other messy kinds of ecological monitoring). Ecol. Evolution 5 , 5295–5304 (2015).
Lortie, C. J., Stewart, G., Rothstein, H. & Lau, J. How to critically read ecological meta-analyses. Res. Synth. Methods 6 , 124–133 (2015).
Gutzat, F. & Dormann, C. F. Exploration of concerns about the evidence-based guideline approach in conservation management: hints from medical practice. Environ. Manag. 66 , 435–449 (2020).
Greenhalgh, T. Will COVID-19 be evidence-based medicine’s nemesis? PLOS Med. 17 , e1003266 (2020).
Article CAS PubMed PubMed Central Google Scholar
Barlow, J. et al. The future of hyperdiverse tropical ecosystems. Nature 559 , 517–526 (2018).
Gurevitch, J. & Hedges, L. V. Statistical issues in ecological meta‐analyses. Ecology 80 , 1142–1149 (1999).
Stone, J. C., Glass, K., Munn, Z., Tugwell, P. & Doi, S. A. R. Comparison of bias adjustment methods in meta-analysis suggests that quality effects modeling may have less limitations than other approaches. J. Clin. Epidemiol. 117 , 36–45 (2020).
Rhodes, K. M. et al. Adjusting trial results for biases in meta-analysis: combining data-based evidence on bias with detailed trial assessment. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 183 , 193–209 (2020).
Article MathSciNet CAS Google Scholar
Efthimiou, O. et al. Combining randomized and non-randomized evidence in network meta-analysis. Stat. Med. 36 , 1210–1226 (2017).
Article MathSciNet PubMed Google Scholar
Welton, N. J., Ades, A. E., Carlin, J. B., Altman, D. G. & Sterne, J. A. C. Models for potentially biased evidence in meta-analysis using empirically based priors. J. R. Stat. Soc. Ser. A (Stat. Soc.) 172 , 119–136 (2009).
Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. & Thompson, S. G. Bias modelling in evidence synthesis. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 172 , 21–47 (2009).
Shackelford, G. E. et al. Dynamic meta-analysis: a method of using global evidence for local decision making. bioRxiv 2020.05.18.078840, https://doi.org/10.1101/2020.05.18.078840 (2020).
Sutherland, W. J., Pullin, A. S., Dolman, P. M. & Knight, T. M. The need for evidence-based conservation. Trends Ecol. evolution 19 , 305–308 (2004).
Ioannidis, J. P. A. Meta-research: Why research on research matters. PLOS Biol. 16 , e2005468 (2018).
Article PubMed PubMed Central CAS Google Scholar
LaLonde, R. J. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76 , 604–620 (1986).
Long, Q., Little, R. J. & Lin, X. Causal inference in hybrid intervention trials involving treatment choice. J. Am. Stat. Assoc. 103 , 474–484 (2008).
Article MathSciNet CAS MATH Google Scholar
Thomson Reuters. ISI Web of Knowledge. http://www.isiwebofknowledge.com (2019).
Stroup, W. W. Generalized linear mixed models: modern concepts, methods and applications . (CRC press, 2012).
Bolker, B. M. et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evolution 24 , 127–135 (2009).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2019).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 , 1–48 (2015).
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S . (Springer, 2002).
Stan Development Team. RStan: the R interface to Stan. R package version 2.19.3 (2020).
Download references
Acknowledgements
We are grateful to the following people and organisations for contributing datasets to this analysis: P. Edwards, G.R. Hodgson, H. Welsh, J.V. Vieira, authors of van Deurs et al. 2012, T. M. Grome, M. Kaspersen, H. Jensen, C. Stenberg, T. K. Sørensen, J. Støttrup, T. Warnar, H. Mosegaard, Axel Schwerk, Alberto Velando, Dolores River Restoration Partnership, J.S. Pinilla, A. Page, M. Dasey, D. Maguire, J. Barlow, J. Louzada, Jari Florestal, R.T. Buxton, C.R. Schacter, J. Seoane, M.G. Conners, K. Nickel, G. Marakovich, A. Wright, G. Soprone, CSIRO, A. Elosegi, L. García-Arberas, J. Díez, A. Rallo, Parks and Wildlife Finland, Parc Marin de la Côte Bleue. Author funding sources: T.A. was supported by the Grantham Foundation for the Protection of the Environment, Kenneth Miller Trust and Australian Research Council Future Fellowship (FT180100354); W.J.S. and P.A.M. were supported by Arcadia, MAVA, and The David and Claudia Harding Foundation; A.P.C. was supported by the Natural Environment Research Council via Cambridge Earth System Science NERC DTP (NE/L002507/1); D.A. was funded by Portugal national funds through the FCT – Foundation for Science and Technology, under the Transitional Standard – DL57 / 2016 and through the strategic project UIDB/04326/2020; M.A. acknowledges Koniambo Nickel SAS, and particularly Gregory Marakovich and Andy Wright; J.C.A. was funded through by Dirección General de Investigación Científica, projects PB97-1252, BOS2002-01543, CGL2005-04893/BOS, CGL2008-02567 and Comunidad de Madrid, as well as by contract HENARSA-CSIC 2003469-CSIC19637; A.A. was funded by Spanish Government: MEC (CGL2007-65176); B.P.B. was funded through the U.S. Geological Survey and the New York City Department of Environmental Protection; R.B. was funded by Comunidad de Madrid (2018-T1/AMB-10374); J.A.S. and D.A.B. were funded through the U.S. Geological Survey and NextEra Energy; R.S.C. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/78813/2011 and strategic project UID/MAR/04292/2013; A.D.B. was funded through the Belgian offshore wind monitoring program (WINMON-BE), financed by the Belgian offshore wind energy sector via RBINS—OD Nature; M.K.D. was funded by the Harold L. Castle Foundation; P.M.E. was funded by the Clackamas County Water Environment Services River Health Stewardship Program and the Portland State University Student Watershed Research Project; T.D.E., J.P.A.G. and A.P. were supported by funding from the New Zealand Department of Conservation (Te Papa Atawhai) and from the Centre for Marine Environmental & Economic Research, Victoria University of Wellington, New Zealand; F.M.F. was funded by CNPq-CAPES grants (PELD site 23 403811/2012-0, PELD-RAS 441659/2016-0, BEX5528/13-5 and 383744/2015-6) and BNP Paribas Foundation (Climate & Biodiversity Initiative, BIOCLIMATE project); B.P.H. was funded by NOAA-NMFS sea scallop research set-aside program awards NA16FM1031, NA06FM1001, NA16FM2416, and NA04NMF4720332; A.L.B. was funded by the Portuguese Foundation for Science and Technology (FCT) grant FCT PD/BD/52597/2014, Bat Conservation International student research fellowship and CNPq grant 160049/2013-0; L.C.M. acknowledges Secretaría de Ciencia y Técnica (UNRC); R.A.M. acknowledges Alaska Fisheries Science Center, NOAA Fisheries, and U.S. Department of Commerce for salary support; C.F.J.M. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/80488/2011; R.R. was funded by the Portuguese Foundation for Science and Technology (FCT) grant PTDC/BIA-BIC/111184/2009, by Madeira’s Regional Agency for the Development of Research, Technology and Innovation (ARDITI) grant M1420-09-5369-FSE-000002 and by a Bat Conservation International student research fellowship; J.C. and S.S. were funded by the Alabama Department of Conservation and Natural Resources; A.T. was funded by the Spanish Ministry of Education with a Formacion de Profesorado Universitario (FPU) grant AP2008-00577 and Dirección General de Investigación Científica, project CGL2008-02567; C.W. was funded by Strategic Science Investment Funding of the Ministry of Business, Innovation and Employment, New Zealand; J.S.K. acknowledges Boreal Peatland LIFE (LIFE08 NAT/FIN/000596), Parks and Wildlife Finland and Kone Foundation; J.J.S.S. was funded by the Mexican National Council on Science and Technology (CONACYT 242558); N.N. was funded by The Carl Tryggers Foundation; I.L.J. was funded by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada; D.D. and D.S. were funded by the French National Research Agency via the “Investment for the Future” program IDEALG (ANR-10-BTBR-04) and by the ALGMARBIO project; R.C.P. was funded by CSIRO and whose research was also supported by funds from the Great Barrier Reef Marine Park Authority, the Fisheries Research and Development Corporation, the Australian Fisheries Management Authority, and Queensland Department of Primary Industries (QDPI). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect those of NOAA or the Department of Commerce.
Author information
Authors and affiliations.
Conservation Science Group, Department of Zoology, University of Cambridge, The David Attenborough Building, Downing Street, Cambridge, CB3 3QZ, UK
Alec P. Christie, Philip A. Martin & William J. Sutherland
Centre of Marine Sciences (CCMar), Universidade do Algarve, Campus de Gambelas, 8005-139, Faro, Portugal
David Abecasis
Institut de Recherche pour le Développement (IRD), UMR 9220 ENTROPIE & Laboratoire d’Excellence CORAIL, Université de Perpignan Via Domitia, 52 avenue Paul Alduy, 66860, Perpignan, France
Mehdi Adjeroud
Museo Nacional de Ciencias Naturales, CSIC, Madrid, Spain
Juan C. Alonso & Carlos Palacín
School of Biological Sciences, University of Queensland, Brisbane, 4072, QLD, Australia
Tatsuya Amano
Education Faculty of Bilbao, University of the Basque Country (UPV/EHU). Sarriena z/g E-48940 Leioa, Basque Country, Spain
Alvaro Anton
U.S. Geological Survey, New York Water Science Center, 425 Jordan Rd., Troy, NY, 12180, USA
Barry P. Baldigo
Universidad Complutense de Madrid, Departamento de Biodiversidad, Ecología y Evolución, Facultad de Ciencias Biológicas, c/ José Antonio Novais, 12, E-28040, Madrid, Spain
Rafael Barrientos & Carlos A. Martín
Durrell Institute of Conservation and Ecology (DICE), School of Anthropology and Conservation, University of Kent, Canterbury, CT2 7NR, UK
Jake E. Bicknell
U.S. Geological Survey, Northern Prairie Wildlife Research Center, Jamestown, ND, 58401, USA
Deborah A. Buhl & Jill A. Shaffer
Northern Gulf Institute, Mississippi State University, 1021 Balch Blvd, John C. Stennis Space Center, Mississippi, 39529, USA
Just Cebrian
MARE – Marine and Environmental Sciences Centre, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal
Ricardo S. Ceia
CFE – Centre for Functional Ecology, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal
Departamento de Ciencias Naturales, Universidad Nacional de Río Cuarto (UNRC), Córdoba, Argentina
Luciana Cibils-Martina
CONICET, Buenos Aires, Argentina
Marine Institute, Rinville, Oranmore, Galway, Ireland
Sarah Clarke & Oliver Tully
National Center for Scientific Research, PSL Université Paris, CRIOBE, USR 3278 CNRS-EPHE-UPVD, Maison des Océans, 195 rue Saint-Jacques, 75005, Paris, France
Joachim Claudet
School of Biological Sciences, University of Western Australia, Nedlands, WA, 6009, Australia
Michael D. Craig
School of Environmental and Conservation Sciences, Murdoch University, Murdoch, WA, 6150, Australia
Sorbonne Université, CNRS, UMR 7144, Station Biologique, F.29680, Roscoff, France
Dominique Davoult & Doriane Stagnol
Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Ankerstraat 1, 8400, Ostend, Belgium
Annelies De Backer
Marine Science Institute, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
Mary K. Donovan
Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Honolulu, HI, 96822, USA
Baruch Institute for Marine & Coastal Sciences, University of South Carolina, Columbia, SC, USA
Tyler D. Eddy
Centre for Fisheries Ecosystems Research, Fisheries & Marine Institute, Memorial University of Newfoundland, St. John’s, Canada
School of Biological Sciences, Victoria University of Wellington, P O Box 600, Wellington, 6140, New Zealand
Tyler D. Eddy, Jonathan P. A. Gardner & Anjali Pande
Lancaster Environment Centre, Lancaster University, LA1 4YQ, Lancaster, UK
Filipe M. França
Fisheries, Aquatic Science and Technology Laboratory, Alaska Pacific University, 4101 University Dr., Anchorage, AK, 99508, USA
Bradley P. Harris
Natural Resources Institute Finland, Manamansalontie 90, 88300, Paltamo, Finland
Department of Biology, Memorial University, St. John’s, NL, A1B 2R3, Canada
Ian L. Jones
National Marine Science Centre and Marine Ecology Research Centre, Southern Cross University, 2 Bay Drive, Coffs Harbour, 2450, Australia
Brendan P. Kelaher
Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
Janne S. Kotiaho
School of Resource Wisdom, University of Jyväskylä, Jyväskylä, Finland
Centre for Ecology, Evolution and Environmental Changes – cE3c, Faculty of Sciences, University of Lisbon, 1749-016, Lisbon, Portugal
Adrià López-Baucells, Christoph F. J. Meyer & Ricardo Rocha
Biological Dynamics of Forest Fragments Project, National Institute for Amazonian Research and Smithsonian Tropical Research Institute, 69011-970, Manaus, Brazil
Granollers Museum of Natural History, Granollers, Spain
Adrià López-Baucells
Department of Biological Sciences, University of New Brunswick, PO Box 5050, Saint John, NB, E2L 4L5, Canada
Heather L. Major
Voimalohi Oy, Voimatie 23, Voimatie, 91100, Ii, Finland
Aki Mäki-Petäys
Natural Resources Institute Finland, Paavo Havaksen tie 3, 90014 University of Oulu, Oulu, Finland
Fundación Migres CIMA Ctra, Cádiz, Spain
Beatriz Martín
Intergovernmental Oceanographic Commission of UNESCO, Marine Policy and Regional Coordination Section Paris 07, Paris, France
BioRISC, St. Catharine’s College, Cambridge, CB2 1RL, UK
Philip A. Martin & William J. Sutherland
Departamento de Ecología e Hidrología, Universidad de Murcia, Campus de Espinardo, 30100, Murcia, Spain
Daniel Mateos-Molina
RACE Division, Alaska Fisheries Science Center, National Marine Fisheries Service, NOAA, 7600 Sand Point Way NE, Seattle, WA, 98115, USA
Robert A. McConnaughey
European Commission, Joint Research Centre (JRC), Ispra, VA, Italy
Michele Meroni
School of Science, Engineering and Environment, University of Salford, Salford, M5 4WT, UK
Christoph F. J. Meyer
Victorian National Park Association, Carlton, VIC, Australia
Department of Earth, Environment and Life Sciences (DiSTAV), University of Genoa, Corso Europa 26, 16132, Genoa, Italy
Monica Montefalcone
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
Norbertas Noreika
Chair of Plant Health, Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Tartu, Estonia
Biosecurity New Zealand – Tiakitanga Pūtaiao Aotearoa, Ministry for Primary Industries – Manatū Ahu Matua, 66 Ward St, PO Box 40742, Wallaceville, New Zealand
Anjali Pande
National Institute of Water & Atmospheric Research Ltd (NIWA), 301 Evans Bay Parade, Greta Point Wellington, New Zealand
CSIRO Oceans & Atmosphere, Queensland Biosciences Precinct, 306 Carmody Road, ST. LUCIA QLD, 4067, Australia
C. Roland Pitcher
Museo Nacional de Ciencias Naturales, CSIC, José Gutiérrez Abascal 2, E-28006, Madrid, Spain
Carlos Ponce
Fort Keogh Livestock and Range Research Laboratory, 243 Fort Keogh Rd, Miles City, Montana, 59301, USA
Matt Rinella
CIBIO-InBIO, Research Centre in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal
Ricardo Rocha
Departamento de Sistemas Físicos, Químicos y Naturales, Universidad Pablo de Olavide, ES-41013, Sevilla, Spain
María C. Ruiz-Delgado
El Colegio de la Frontera Sur, A.P. 424, 77000, Chetumal, QR, Mexico
Juan J. Schmitter-Soto
Division of Fish and Wildlife, New York State Department of Environmental Conservation, 625 Broadway, Albany, NY, 12233-4756, USA
Shailesh Sharma
University of Denver Department of Biological Sciences, Denver, CO, USA
Anna A. Sher
U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, 80526, USA
Thomas R. Stanley
School for Marine Science and Technology, University of Massachusetts Dartmouth, New Bedford, MA, USA
Kevin D. E. Stokesbury
Georges Lemaître Earth and Climate Research Centre, Earth and Life Institute, Université Catholique de Louvain, 1348, Louvain-la-Neuve, Belgium
Aurora Torres
Center for Systems Integration and Sustainability, Department of Fisheries and Wildlife, 13 Michigan State University, East Lansing, MI, 48823, USA
Natural Resources Institute Finland, Latokartanonkaari 9, 00790, Helsinki, Finland
Teppo Vehanen
Manaaki Whenua – Landcare Research, Private Bag 3127, Hamilton, 3216, New Zealand
Corinne Watts
Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, UK
Qingyuan Zhao
You can also search for this author in PubMed Google Scholar
Contributions
A.P.C., T.A., P.A.M., Q.Z., and W.J.S. designed the research; A.P.C. wrote the paper; D.A., M.A., J.C.A., A.A., B.P.B, R.B., J.B., D.A.B., J.C., R.S.C., L.C.M., S.C., J.C., M.D.C, D.D., A.D.B., M.K.D., T.D.E., P.M.E., F.M.F., J.P.A.G., B.P.H., A.H., I.L.J., B.P.K., J.S.K., A.L.B., H.L.M., A.M., B.M., C.A.M., D.M., R.A.M, M.M., C.F.J.M.,K.M., M.M., N.N., C.P., A.P., C.R.P., C.P., M.R., R.R., M.C.R., J.J.S.S., J.A.S., S.S., A.A.S., D.S., K.D.E.S., T.R.S., A.T., O.T., T.V., C.W. contributed datasets for analyses. All authors reviewed, edited, and approved the manuscript.
Corresponding author
Correspondence to Alec P. Christie .
Ethics declarations
Competing interests.
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Casper Albers, Samuel Scheiner, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information, peer review file, description of additional supplementary information, supplementary data 1, supplementary data 2, source data, source data, rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Christie, A.P., Abecasis, D., Adjeroud, M. et al. Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences. Nat Commun 11 , 6377 (2020). https://doi.org/10.1038/s41467-020-20142-y
Download citation
Received : 29 January 2020
Accepted : 13 November 2020
Published : 11 December 2020
DOI : https://doi.org/10.1038/s41467-020-20142-y
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
This article is cited by
Robust language-based mental health assessments in time and space through social media.
- Siddharth Mangalik
- Johannes C. Eichstaedt
- H. Andrew Schwartz
npj Digital Medicine (2024)
Is there a “difference-in-difference”? The impact of scientometric evaluation on the evolution of international publications in Egyptian universities and research centres
- Mona Farouk Ali
Scientometrics (2024)
Quantifying research waste in ecology
- Marija Purgar
- Tin Klanjscek
- Antica Culina
Nature Ecology & Evolution (2022)
Assessing assemblage-wide mammal responses to different types of habitat modification in Amazonian forests
- Paula C. R. Almeida-Maués
- Anderson S. Bueno
- Ana Cristina Mendes-Oliveira
Scientific Reports (2022)
Mitigating impacts of invasive alien predators on an endangered sea duck amidst high native predation pressure
- Kim Jaatinen
- Ida Hermansson
Oecologia (2022)
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.
- Skip to main content
- Skip to primary sidebar
- Skip to footer
- QuestionPro
- Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
- Resources Blog eBooks Survey Templates Case Studies Training Help center
Home Market Research Research Tools and Apps
Research bias: What it is, Types & Examples
The researcher sometimes unintentionally or actively affects the process while executing a systematic inquiry. It is known as research bias, and it can affect your results just like any other sort of bias.
When it comes to studying bias, there are no hard and fast guidelines, which simply means that it can occur at any time. Experimental mistakes and a lack of concern for all relevant factors can lead to research bias.
One of the most common causes of study results with low credibility is study bias. Because of its informal nature, you must be cautious when characterizing bias in research. To reduce or prevent its occurrence, you need to be able to recognize its characteristics.
This article will cover what it is, its type, and how to avoid it.
Content Index
What is research bias?
How does research bias affect the research process, types of research bias with examples, how questionpro helps in reducing bias in a research process.
Research bias is a technique in which the researchers conducting the experiment modify the findings to present a specific consequence. It is often known as experimenter bias.
Bias is a characteristic of the research technique that makes it rely on experience and judgment rather than data analysis. The most important thing to know about bias is that it is unavoidable in many fields. Understanding research bias and reducing the effects of biased views is an essential part of any research planning process.
For example, it is much easier to become attracted to a certain point of view when using social research subjects, compromising fairness.
Research bias can majorly affect the research process, weakening its integrity and leading to misleading or erroneous results. Here are some examples of how this bias might affect the research process:
Distorted research design
When bias is present, study results can be skewed or wrong. It can make the study less trustworthy and valid. If bias affects how a study is set up, how data is collected, or how it is analyzed, it can cause systematic mistakes that move the results away from the true or unbiased values.
Invalid conclusions
It can make it hard to believe that the findings of a study are correct. Biased research can lead to unjustified or wrong claims because the results may not reflect reality or give a complete picture of the research question.
Misleading interpretations
Bias can lead to inaccurate interpretations of research findings. It can alter the overall comprehension of the research issue. Researchers may be tempted to interpret the findings in a way that confirms their previous assumptions or expectations, ignoring alternate explanations or contradictory evidence.
Ethical concerns
This bias poses ethical considerations. It can have negative effects on individuals, groups, or society as a whole. Biased research can misinform decision-making processes, leading to ineffective interventions, policies, or therapies.
Damaged credibility
Research bias undermines scientific credibility. Biased research can damage public trust in science. It may reduce reliance on scientific evidence for decision-making.
Bias can be seen in practically every aspect of quantitative research and qualitative research , and it can come from both the survey developer and the participants. The sorts of biases that come directly from the survey maker are the easiest to deal with out of all the types of bias in research. Let’s look at some of the most typical research biases.
Design bias
Design bias happens when a researcher fails to capture biased views in most experiments. It has something to do with the organization and its research methods. The researcher must demonstrate that they realize this and have tried to mitigate its influence.
Another design bias develops after the research is completed and the results are analyzed. It occurs when the researchers’ original concerns are not reflected in the exposure, which is all too often these days.
For example, a researcher working on a survey containing questions concerning health benefits may overlook the researcher’s awareness of the sample group’s limitations. It’s possible that the group tested was all male or all over a particular age.
Selection bias or sampling bias
Selection bias occurs when volunteers are chosen to represent your research population, but those with different experiences are ignored.
In research, selection bias manifests itself in a variety of ways. When the sampling method puts preference into the research, this is known as sampling bias . Selection bias is also referred to as sampling bias.
For example, research on a disease that depended heavily on white male volunteers cannot be generalized to the full community, including women and people of other races or communities.
Procedural bias
Procedural bias is a sort of research bias that occurs when survey respondents are given insufficient time to complete surveys. As a result, participants are forced to submit half-thoughts with misinformation, which does not accurately reflect their thinking.
Another sort of study bias is using individuals who are forced to participate, as they are more likely to complete the survey fast, leaving them with enough time to accomplish other things.
For Example, If you ask your employees to survey their break, they may be pressured, which may compromise the validity of their results.
Publication or reporting bias
A sort of bias that influences research is publication bias. It is also known as reporting bias. It refers to a condition in which favorable outcomes are more likely to be reported than negative or empty ones. Analysis bias can also make it easier for reporting bias to happen.
The publication standards for research articles in a specific area frequently reflect this bias on them. Researchers sometimes choose not to disclose their outcomes if they believe the data do not reflect their theory.
As an example, there was seven research on the antidepressant drug Reboxetine. Among them, only one got published, and the others were unpublished.
Measurement of data collecting bias
A defect in the data collection process and measuring technique causes measurement bias. Data collecting bias is also known as measurement bias. It occurs in both qualitative and quantitative research methodologies.
Data collection methods might occur in quantitative research when you use an approach that is not appropriate for your research population. Instrument bias is one of the most common forms of measurement bias in quantitative investigations. A defective scale would generate instrument bias and invalidate the experimental process in a quantitative experiment.
For example, you may ask those who do not have internet access to survey by email or on your website.
Data collection bias occurs in qualitative research when inappropriate survey questions are asked during an unstructured interview. Bad survey questions are those that lead the interviewee to make presumptions. Subjects are frequently hesitant to provide socially incorrect responses for fear of criticism.
For example, a topic can avoid coming across as homophobic or racist in an interview.
Some more types of bias in research include the ones listed here. Researchers must understand these biases and reduce them through rigorous study design, transparent reporting, and critical evidence review:
- Confirmation bias: Researchers often search for, evaluate, and prioritize material that supports their existing hypotheses or expectations, ignoring contradictory data. This can lead to a skewed perception of results and perhaps biased conclusions.
- Cultural bias: Cultural bias arises when cultural norms, attitudes, or preconceptions influence the research process and the interpretation of results.
- Funding bias: Funding bias takes place when powerful motives support research. It can bias research design, data collecting, analysis, and interpretation toward the funding source.
- Observer bias: Observer bias arises when the researcher or observer affects participants’ replies or behavior. Collecting data might be biased by accidental clues, expectations, or subjective interpretations.
LEARN ABOUT: Theoretical Research
QuestionPro offers several features and functionalities that can contribute to reducing bias in the research process. Here’s how QuestionPro can help:
Randomization
QuestionPro allows researchers to randomize the order of survey questions or response alternatives. Randomization helps to remove order effects and limit bias from the order in which participants encounter the items.
Branching and skip logic
Branching and skip logic capabilities in QuestionPro allow researchers to design customized survey pathways based on participants’ responses. It enables tailored questioning, ensuring that only pertinent questions are asked of participants. Bias generated by such inquiries is reduced by avoiding irrelevant or needless questions.
Diverse question types
QuestionPro supports a wide range of questions kinds, including multiple-choice, Likert scale, matrix, and open-ended questions. Researchers can choose the most relevant question kinds to get unbiased data while avoiding leading or suggestive questions that may affect participants’ responses.
Anonymous responses
QuestionPro enables researchers to collect anonymous responses, protecting the confidentiality of participants. It can encourage participants to provide more unbiased and equitable feedback, especially when dealing with sensitive or contentious issues.
Data analysis and reporting
QuestionPro has powerful data analysis and reporting options, such as charts, graphs, and statistical analysis tools. These properties allow researchers to examine and interpret obtained data objectively, decreasing the role of bias in interpreting results.
Collaboration and peer review
QuestionPro supports peer review and researcher collaboration. It helps uncover and overcome biases in research planning, questionnaire formulation, and data analysis by involving several researchers and soliciting external opinions.
You must comprehend biases in research and how to deal with them. Knowing the different sorts of biases in research allows you to readily identify them. It is also necessary to have a clear idea to recognize it in any form.
QuestionPro provides many research tools and settings that can assist you in dealing with research bias. Try QuestionPro today to undertake your original bias-free quantitative or qualitative research.
LEARN MORE FREE TRIAL
Frequently Asking Questions
Research bias affects the validity and dependability of your research’s findings, resulting in inaccurate interpretations of the data and incorrect conclusions.
Bias should be avoided in research to ensure that findings are accurate, valid, and objective.
To avoid research bias, researchers should take proactive steps throughout the research process, such as developing a clear research question and objectives, designing a rigorous study, following standardized protocols, and so on.
MORE LIKE THIS
Was The Experience Memorable? — Tuesday CX Thoughts
Sep 10, 2024
What Does a Data Analyst Do? Skills, Tools & Tips
Sep 9, 2024
Best Gallup Access Alternatives & Competitors in 2024
Sep 6, 2024
Experimental vs Observational Studies: Differences & Examples
Sep 5, 2024
Other categories
- Academic Research
- Artificial Intelligence
- Assessments
- Brand Awareness
- Case Studies
- Communities
- Consumer Insights
- Customer effort score
- Customer Engagement
- Customer Experience
- Customer Loyalty
- Customer Research
- Customer Satisfaction
- Employee Benefits
- Employee Engagement
- Employee Retention
- Friday Five
- General Data Protection Regulation
- Insights Hub
- Life@QuestionPro
- Market Research
- Mobile diaries
- Mobile Surveys
- New Features
- Online Communities
- Question Types
- Questionnaire
- QuestionPro Products
- Release Notes
- Research Tools and Apps
- Revenue at Risk
- Survey Templates
- Training Tips
- Tuesday CX Thoughts (TCXT)
- Uncategorized
- What’s Coming Up
- Workforce Intelligence
Incorporate STEM journalism in your classroom
- Exercise type: Activity
- Topic: Science & Society
- Category: Research & Design
- Category: Diversity in STEM
How bias affects scientific research
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base
- Research bias
- Sampling Bias and How to Avoid It | Types & Examples
Sampling Bias and How to Avoid It | Types & Examples
Published on May 20, 2020 by Pritha Bhandari . Revised on March 17, 2023.
Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others. It is also called ascertainment bias in medical fields.
Sampling bias limits the generalizability of findings because it is a threat to external validity , specifically population validity. In other words, findings from biased samples can only be generalized to populations that share characteristics with the sample.
Table of contents
Causes of sampling bias, types of sampling bias, how to avoid or correct sampling bias, other types of research bias, frequently asked questions about sampling bias.
Your choice of research design or data collection method can lead to sampling bias. This type of research bias can occur in both probability and non-probability sampling .
Sampling bias in probability samples
In probability sampling , every member of the population has a known chance of being selected. For instance, you can use a random number generator to select a simple random sample from your population.
Although this procedure reduces the risk of sampling bias, it may not eliminate it. If your sampling frame – the actual list of individuals that the sample is drawn from – does not match the population, this can result in a biased sample.
Sampling bias in non-probability samples
A non-probability sample is selected based on non-random criteria. For instance, in a convenience sample, participants are selected based on accessibility and availability.
Non-probability sampling often results in biased samples because some members of the population are more likely to be included than others.
Type | Explanation | Example |
---|---|---|
People with specific characteristics are more likely to agree to take part in a study than others. | People who are more thrill-seeking are likely to take part in pain research studies. This may skew the data. | |
People who refuse to participate or drop out from a study systematically differ from those who take part. | In a study on stress and workload, employees with high workloads are less likely to participate. The resulting sample may not vary greatly in terms of workload. | |
Some members of a population are inadequately represented in the sample. | Administering general national surveys online may miss groups with limited internet access, such as the elderly and lower-income households. | |
Successful , people and objects are more likely to be represented in the sample than unsuccessful ones. | In scientific journals, there is strong publication bias towards positive results. Successful research outcomes are published far more often than findings. | |
Pre-screening or advertising bias | The way participants are pre-screened or where a study is advertised may bias a sample. | When seeking volunteers to test a novel sleep intervention, you may end up with a sample that is more motivated to improve their sleep habits than the rest of the population. As a result, they may have been likely to improve their sleep habits regardless of the effects of your intervention. |
Healthy user bias | Volunteers for preventative interventions are more likely to pursue health-boosting behaviors and activities than other members of the population. | A sample in a preventative intervention has a better diet, higher physical activity levels, abstains from alcohol, and avoids smoking more than most of the population. The experimental findings may be a result of the treatment interacting with these characteristics of the sample, rather than just the treatment itself. |
Using careful research design and sampling procedures can help you avoid sampling bias.
- Define a target population and a sampling frame (the list of individuals that the sample will be drawn from). Match the sampling frame to the target population as much as possible to reduce the risk of sampling bias.
- Make online surveys as short and accessible as possible.
- Follow up on non-responders.
- Avoid convenience sampling .
Oversampling to avoid bias
Oversampling can be used to avoid sampling bias in situations where members of defined groups are underrepresented (undercoverage). This is a method of selecting respondents from some groups so that they make up a larger share of a sample than they actually do the population.
After all data is collected, responses from oversampled groups are weighted to their actual share of the population to remove any sampling bias.
They gather a nationally representative sample, with 1500 respondents, that oversamples Asian Americans. Random digit dialling is used to contact American households, and disproportionately larger samples are taken from regions with more Asian Americans. Of the 1500 respondents, 336 are Asian American. Based on this sample size, the researcher can be confident in their findings about Asian Americans.
Cognitive bias
- Confirmation bias
- Baader–Meinhof phenomenon
Selection bias
- Sampling bias
- Ascertainment bias
- Attrition bias
- Self-selection bias
- Survivorship bias
- Nonresponse bias
- Undercoverage bias
- Hawthorne effect
- Observer bias
- Omitted variable bias
- Publication bias
- Pygmalion effect
- Recall bias
- Social desirability bias
- Placebo effect
A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others.
Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.
Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.
Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .
Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Bhandari, P. (2023, March 17). Sampling Bias and How to Avoid It | Types & Examples. Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/research-bias/sampling-bias/
Is this article helpful?
Pritha Bhandari
Other students also liked, sampling methods | types, techniques & examples, population vs. sample | definitions, differences & examples, survey research | definition, examples & methods.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- My Bibliography
- Collections
- Citation manager
Save citation to file
Email citation, add to collections.
- Create a new collection
- Add to an existing collection
Add to My Bibliography
Your saved search, create a file for external citation management software, your rss feed.
- Search in PubMed
- Search in NLM Catalog
- Add to Search
[Three types of bias: distortion of research results and how that can be prevented]
Affiliation.
- 1 Universitair Medisch Centrum Utrecht, Julius Centrum voor Gezondheidswetenschappen en Eerstelijns Geneeskunde, Utrecht.
- PMID: 25714762
A systematic distortion of the relationship between a treatment, risk factor or exposure and clinical outcomes is denoted by the term 'bias'. Three types of bias can be distinguished: information bias, selection bias, and confounding. These three types of bias and their potential solutions are discussed using various examples.
PubMed Disclaimer
Similar articles
- Selection bias and information bias in clinical research. Tripepi G, Jager KJ, Dekker FW, Zoccali C. Tripepi G, et al. Nephron Clin Pract. 2010;115(2):c94-9. doi: 10.1159/000312871. Epub 2010 Apr 21. Nephron Clin Pract. 2010. PMID: 20407272 Review.
- Science mapping analysis characterizes 235 biases in biomedical research. Chavalarias D, Ioannidis JP. Chavalarias D, et al. J Clin Epidemiol. 2010 Nov;63(11):1205-15. doi: 10.1016/j.jclinepi.2009.12.011. Epub 2010 Apr 18. J Clin Epidemiol. 2010. PMID: 20400265
- Limits for the Magnitude of M-bias and Certain Other Types of Structural Selection Bias. Flanders WD, Ye D. Flanders WD, et al. Epidemiology. 2019 Jul;30(4):501-508. doi: 10.1097/EDE.0000000000001031. Epidemiology. 2019. PMID: 31033689
- Acknowledging and Overcoming Nonreproducibility in Basic and Preclinical Research. Ioannidis JP. Ioannidis JP. JAMA. 2017 Mar 14;317(10):1019-1020. doi: 10.1001/jama.2017.0549. JAMA. 2017. PMID: 28192565 No abstract available.
- Bias, Confounding, and Interaction: Lions and Tigers, and Bears, Oh My! Vetter TR, Mascha EJ. Vetter TR, et al. Anesth Analg. 2017 Sep;125(3):1042-1048. doi: 10.1213/ANE.0000000000002332. Anesth Analg. 2017. PMID: 28817531 Review.
Publication types
- Search in MeSH
LinkOut - more resources
Full text sources.
- Vereniging Nederlands Tijdschrift voor Geneeskunde
- Citation Manager
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
- Open access
- Published: 09 September 2024
The validation of short eating disorder, body dysmorphia, and Weight Bias Internalisation Scales among UK adults
- Dorottya Lantos 1 ,
- Darío Moreno-Agostino 2 , 4 ,
- Lasana T. Harris 3 ,
- George Ploubidis 2 ,
- Lucy Haselden 2 &
- Emla Fitzsimons 2
Journal of Eating Disorders volume 12 , Article number: 137 ( 2024 ) Cite this article
Metrics details
When collecting data from human participants, it is often important to minimise the length of questionnaire-based measures. This makes it possible to ensure that the data collection is as engaging as possible, while it also reduces response burden, which may protect data quality. Brevity is especially important when assessing eating disorders and related phenomena, as minimising questions pertaining to shame-ridden, unpleasant experiences may in turn minimise any negative affect experienced whilst responding.
We relied on item response theory to shorten three eating disorder and body dysmorphia measures, while aiming to ensure that the information assessed by the scales remained as close to that assessed by the original scales as possible. We further tested measurement invariance, correlations among different versions of the same scales as well as different measures, and explored additional properties of each scale, including their internal consistency. Additionally, we explored the performance of the 3-item version of the modified Weight Bias Internalisation Scale and compared it to that of the 11-item version of the scale.
We introduce a 5-item version of the Eating Disorder Examination Questionnaire, a 3-item version of the SCOFF questionnaire, and a 3-item version of the Dysmorphic Concern Questionnaire. The results revealed that, across a sample of UK adults ( N = 987, ages 18–86, M = 45.21), the short scales had a reasonably good fit. Significant positive correlations between the longer and shorter versions of the scales and their significant positive, albeit somewhat weaker correlations to other, related measures support their convergent and discriminant validity. The results followed a similar pattern across the young adult subsample ( N = 375, ages 18–39, M = 28.56).
Conclusions
These results indicate that the short forms of the tested scales may perform similarly to the full versions.
Plain English summary
This manuscript introduces short versions of existing measures of eating disorders and body dysmorphia, specifically the Eating Disorder Examination Questionnaire , the SCOFF Questionnaire , and the Dysmorphic Concern Questionnaire. We further investigate the properties of the recently introduced 3-item short version of the modified Weight Bias Internalisation Scale. Across analyses including measurement invariance testing and bivariate correlations aiming to assess convergent and discriminant validity, we find support that the short scales may perform similarly to their longer versions. These short scales may contribute in meaningful ways to research where the brevity of questionnaire-type measures may make a difference by contributing to data quality.
The time participants volunteer to partake in research is invaluable. Ensuring that participants spend this time in a meaningful way and no unnecessary time is granted is not only an ethical priority, but also a way to obtain high quality data [ 1 , 2 ]. Questionnaire-type measures are among the most often used methods for data collection in the psychological and social sciences [ 3 , 4 ]. Historically, such scales have been designed to capture a given construct in an in-depth manner, often resulting in a long series of items, taking a long time to complete. More recently, advances in psychometrics have revealed that fewer items are in many cases sufficient to capture the same underlying construct, without losing meaningful information [ 5 , 6 ].
The amount of time taken to complete questionnaires is an especially important objective whilst designing test packages for longitudinal cohort studies. In preparation for the upcoming 2023 data sweep of the Millennium Cohort Study (MCS, 7,8), which has followed the lives of nearly 19,000 UK individuals born in 2000–2001, we aimed to optimise selected eating disorder and body dysmorphia scales among a sample of UK adults. The analyses presented here complement our recent analyses performed with the aim of comparing and optimising measures of depression, anxiety, and psychological distress in preparation for the same MCS data sweep [ 9 ]. Specifically, here we examined the properties of the 12-item short version of the Eating Disorder Examination Questionnaire (EDE-QS, 10), the 5-item SCOFF questionnaire [ 11 ], the 7-item Dysmorphic Concern Questionnaire (DCQ, 12), and the 11-item and 3-item versions of the modified Weight Bias Internalisation Scale (WBIS, 13,14).
Our aim was to ensure that these widely used scales can be administered in as little time as possible, whilst capturing similar variance and information to their original versions. Experiences of eating disorders and body dysmorphia may be highly unpleasant and shame-ridden [ 15 , 16 , 17 ]. Thus, asking a limited number of questions regarding such experiences may be especially important in ensuring that participants are exposed to as little amount of stress as possible, without compromising data quality. We further tested the measurement invariance of these self-report scales in order to ensure that they can be used across cohorts, enabling measurement harmonisation and thus facilitating cross-cohort comparisons [ 18 , 19 ].
Optimising questionnaires
Questionnaire-based measures historically tended to comprise many, often dozens of items. This was driven by the intention to truly capture an underlying construct as accurately as possible. However, using such measures may be counterproductive in some cases, as studies which last too long also compromise data quality [ 2 ]. Often cited causes for this include boredom effects (i.e., participants’ performance/attention decreases as they become bored and lose interest), response burden (i.e., the effort required to complete questionnaire, which increases as the length of the questionnaire increases), and fatigue (i.e., participants’ performance/attention decreases as they become tired). Longer scales are additionally more likely to result in missing data. One of our aims in this study is to optimise self-report questionnaires for brevity.
Another key issue with questionnaires is potential bias. Several factors may influence the way in which people perceive certain questions, including cultural differences or other differences related to age, including the historical time during which one grew up, etc. [ 20 ]. Such bias may lead individuals to interpret questions differently, which ultimately may lead to different constructs being assessed by the same scale [ 21 , 22 , 23 ]. Thus, in this study we further assess measurement invariance across sex and age groups.
Selecting and developing measures for cohort studies
Longitudinal birth cohort studies follow a cohort of participants born around the same time. Such designs allow researchers the opportunity to study the effects of social, economic, and environmental factors on key outcomes across the lifespan [ 24 , 25 ]. Several birth cohort studies conducted throughout the past decade in the UK are currently still running, including cohorts born in 1946, 1958, 1970, 2000/01. An important factor when selecting and developing measures for inclusion in birth cohort studies is brevity. Brevity contributes to ensuring that participants in longitudinal studies remain engaged and minimises attrition. One way to do this is to optimise scales by minimising their number of items. However, it must be ensured that the included scales are valid and reliable. In addition, it is important to ensure that all measures assess the same construct across different groups, such as across sex or age groups in a population [ 18 , 19 ]. This further facilitates the comparison across studies, including cross-cohort comparisons.
Overview of the study
Using an online survey, we explored the properties of existing self-report measures of eating disorders, body dysmorphia, and weight bias internalisation among UK adults. We aimed to optimise selected measures by reducing the number of items which participants are required to respond to. We further examined the same characteristics among only the young adult subsample (18–39 years) and ensured that the optimised short versions of each scale exhibit similar properties in the young adult sample and the full sample. In preparation for the next MCS [ 7 , 8 ] data sweep, this age group is of special interest. To gather data of the highest possible quality, keeping in mind the limited availability of survey time, we aim to inform the selection of self-report questionnaires for use in the upcoming data sweep (age 22, 2023) with the results presented here.
More specifically, our aim was to find a short set of items that correlate highly with longer widely used scales, but which are less time-consuming to complete. We have tried to shorten the scales based on multiple factors: retaining the maximum amount of information across different levels of the underlying construct, thinking of the general (non-clinical) population, and focusing on reducing participant burden. We have assessed whether these shorter measures may rank-order the participants in a similar way as the longer versions. While undoubtedly there is a loss of granularity with the shortening of scales, data quality may be, overall, be improved this way if, for example, these scales are to be embedded in lengthy questionnaires. Under such circumstances, reducing participant burden is especially important as it may lead to a lack of attention, disengagement, or missing data, among others. Thus, while shorter scales do not necessarily mean better scales, there may certainly be cases where shorter options are better at meeting the researchers’ aims.
As the analyses presented here additionally allow us to optimise these same scales across UK adults of all ages, we aim to inform other researchers who may be conducting studies in this population. We tested measurement invariance to ensure that the scales tested here assessed the same constructs across sex and age groups. The online survey included additional measures of depression, anxiety, and psychological distress as well, which are explored in detail elsewhere [ 9 ]. The study was preregistered ( https://osf.io/bk9xs ) Footnote 1 . Ethical approval was obtained from the Ethics Committee of University College London. All data and syntax files are available via OSF ( https://osf.io/vg4a9/ ).
Participants
A sample of 1,068 UK adults started the survey. The sample was recruited via Prolific ( www.prolific.com ) to closely mimic one that is representative of the UK population. To recruit a sample that approximates representativeness, Prolific uses data from the UK Office of National Statistics, and matches participants to the national population as closely as possible on age, gender, and ethnicity. We removed the data of 8 participants who gave consent to partaking but did not consent to the storage of their data, as well as 40 participants who only filled in the consent form and nothing else. We excluded a further 33 participants from data analysis due to incorrect responses to (one or both) attention check questions (e.g., Please select agree). The final sample consisted of 987 participants (463 males, 505 females, 2 participants indicated that they did not wish to share their sex Footnote 2 ), ages 18–86, M = 45.21, SD = 15.61. Seventeen participants only partially completed the survey, and their demographics details were thus missing. Participants were recruited via Prolific Academic and reimbursed £7.50 for their time. Across some of the analyses we were interested primarily in the responses of young adults, and hence completed them by including only the 375 participants who were aged 18–39 ( M = 28.56, SD = 6.39, 184 males, 191 females).
Data was collected as part of a larger project in November, 2021 (see for further details: [ 9 ]. We created an online survey using Qualtrics software. Participants were first presented with an informed consent form and information sheet detailing their tasks throughout the study. They next completed several psychometric questionnaires, including measures focusing on the assessment of eating disorders and body dysmorphia described below. All scales were presented in a randomized order across participants. Finally, participants responded to demographic questions (sex, gender, age, ethnicity), were debriefed and thanked for their time.
Eating disorders were assessed using the 12-item short version of the EDE-QS [ 10 ], the 5-item SCOFF questionnaire [ 11 ], and the 22-item eating disorder diagnostic scale (EDDS, 23,24).
The EDE-QS [ 10 ] was completed by 972 participants. Participants responded to 10 items of the EDE-QS (e.g., On how many of the past 7 days have you had a definite fear that you might gain weight? ) on a 4-point scale with response options 0 = 0 days, 1 = 1–2 days, 2 = 3–5 days, 3 = 6–7 days; and to two items (e.g., Over the past 7 days, how dissatisfied have you been with your weight or shape? ) on a 4-point scale with response options 0 = not at all, 1 = slightly, 2 = moderately, 3 = markedly. Participants’ responses were summed, with higher scores indicating an increased presence of characteristics of eating disorders.
The SCOFF [ 11 ] was completed by 975 participants. Participants completed 5 items of the questionnaire (e.g., Do you make yourself sick because you feel uncomfortably full? ) using binary yes/no responses. We scored ‘yes’ responses as 1 and ‘no’ responses as 0, and summed participants’ answers, with higher scores indicating a greater likelihood for the presence of eating disorders.
The EDDS [ 26 , 27 ] was completed by 974 participants. The 22 items which participants completed included a variety of response methods, e.g., questions asked participants to enter their weight and height, to respond to binary questions with yes/no responses (e.g., During the times when you ate an unusually large amount of food, did you experience a loss of control (feel you couldn’t stop eating or control what or how much you were eating)? ), or to respond to 15 point scales (e.g., How many times per week on average over the past 3 months have you made yourself vomit to prevent weight gain or counteract the effects of eating, with response options between 0 and 14), among others. We used existing code [ 27 ] to calculate index scores (raw eating disorder composite score and Z-transformed eating disorder composite score) based on participants’ responses, where higher scores indicate a greater likelihood for the presence of eating disorders. Note that as a diagnostic tool this scale corresponds directly to the DSM-IV rather than the DSM-V diagnostic criteria of eating disorders.
Body dysmorphia was assessed using the 4-item body dysmorphic disorder questionnaire (BDDQ, 25) and the 7-item DCQ [ 12 ]. The BDDQ [ 28 ] was completed by 997 participants. This scale is made up of four core questions, where each question is presented based on participants’ previous responses (e.g., the question ‘Is your main concern with how you look that you aren’t thin enough or that you might get too fat?’ is only presented if a participant responds ‘yes’ to the question ‘Are you worried about how you look?’). This scale functions as a diagnostic tool for eating disorders. Following the scoring guidelines, we coded participants either as being at risk of an eating disorder (coded 1, overall sample: N = 183 out of 987; young adults: N = 109 out of 375) or not (coded 0).
The DCQ [ 12 ] was completed by 977 participants. Participants responded to the 7 items of the DCQ (e.g., Have you ever been very concerned about some aspect of your physical appearance? ) on a 4-point scale with response options 0 = not at all, 1 = same as most people, 2 = more than most people, 3 = much more than most people. Participants’ responses were summed, with higher scores indicating increased body dysmorphia.
Weight bias internalisation was assessed using the 11-item [ 14 ] and 3-item [ 13 ] versions of the WBIS. The scales were completed by 978 participants. Participants responded to the items (e.g., I hate myself for my weight) on a 7-point Likert scale with response options ranging from 1 = strongly disagree to 7 = strongly agree. Participants’ responses on selected items were reverse scored and all scores were summed in a way that higher scores reflect increased weight bias internalisation.
Depression , anxiety , and psychological distress were also assessed as part of the survey, though these scales are examined in detail elsewhere [ 9 ]. The 10-item K10 scale and the 6-item K6 scale embedded in it [ 29 ], the 9-item version of the Malaise Inventory [ 30 , 31 ], the PHQ-9 [ 32 , 33 ], PHQ-2 [ 34 ], GAD-7 [ 35 ], and GAD-2 [ 36 ] were included (see the Supplementary Materials for further details).
Data analyses
Measurement properties.
We used MPlus version 8.7 [ 37 ] to explore measurement properties with a latent variable modelling approach. To test the latent structure of each self-report measure we used confirmatory factor analyses with a robust mean and variance adjusted weighted least squares (WLSMV) estimator, with either a model for binary (Yes vs. No responses) or ordered categorical data (questionnaires with multiple ordered response options) depending on the type of responses used for each scale. Because each of the self-report questionnaires which we focus on here have well-established factor structures, we relied on confirmatory factor analyses. We used the root mean square error of approximation (RMSEA, [ 38 ]), the comparative fit index (CFI, [ 39 ]), and the Tucker-Lewis Index (TLI, [ 40 ]) to determine model fit. We interpreted RMSEA values up to 0.05 as indicating good fit, and values up to 0.08 as indicating adequate fit [ 41 ]. In the cases of CFI and TLI, we interpreted values greater than 0.90 as indicating adequate, and those greater than 0.95 as indicating good model fit [ 42 ].
Finally, we plotted test information functions (TIF) to evaluate the precision of measurement of the self-report questionnaires using MPlus version 8.7 [ 37 ]. TIF plots illustrate Fischer information - i.e., an indicator of the precision or reliability of the measure due to their inverse relationship with the standard error of measurement - at different levels of the underlying latent variable [ 43 ]. All analyses exploring the properties of the self-report questionnaires were conducted on the complete sample as well as on the young adult subsample.
Item reduction
We aimed to optimise two of the eating disorder measures, the EDE-QS and SCOFF, and two of the body dysmorphia measures, the DCQ and WBIS, by shortening them using item response theory. The diagnostic measures, the EDDS and BDDQ, served instead as measures against which we could validate the emerging results. We relied on the factor analyses conducted for the EDE-QS, SCOFF, DCQ, and WBIS to examine their general properties. Our approach was to take a small number of items which load the highest on the underlying factors (i.e., those with the highest discrimination parameter, ideally three items) to create the short scale, while ensuring that the TIF remains as similar as possible to that of the original scale and that internal consistency also remains optimal.
As the measures included in the present study may be used to screen clinical populations, certain items may provide limited information in the general population despite being important in clinical samples. As here we aimed to develop short measures for use in nonclinical samples, we additionally took into consideration the thresholds related to the items. This way, we attempted to avoid the inclusion of any items which may be less informative in the target sample. Where item thresholds were very high, thus resulting in low item endorsement and, subsequently, low variability in a general (not clinical) population like that of MCS, lower item loadings but thresholds closer to the centre of the distribution of latent factor scores were preferred. Unless otherwise noted, the thresholds did not suggest that items should not be retained.
Measurement invariance
To determine whether the measurement properties of the scales were equivalent across sex and age groups, we used a measurement invariance testing strategy. To compare ages, we split the sample according to younger adults (18–39 years) and older adults (40 + years), as in the previous analyses. We tested measurement invariance to explore any potential bias within the self-report questionnaires across sexes or age groups caused by measurement error [ 18 , 19 , 44 , 45 ]. We conducted the analyses across four groups (sex * age: younger males, older males, younger females, and older females). We used a WLSMV estimator and tested two levels of invariance: configural invariance, without constraining any measurement parameters to be equal across the groups, and scalar invariance, where the items’ loadings as well as their thresholds are constrained to be equal across the groups. We compared the goodness-of-fit indices of the two models. Since the chi-square difference test is very sensitive to sample size, invariance was also informed by additional fit indices. Models where the loss of fit was less than 0.01 for CFI and 0.015 for RMSEA met the criteria for invariance [ 46 , 47 ]. These analyses were conducted using MPlus version 8.7 [ 37 ].
Note that this type of strategy could not be implemented in scales with three or less items, since in those cases the configural model is just-identified at best, thus resulting in non-meaningful goodness-of-fit indices that cannot be compared to those from models with invariance constraints. It was thus not possible to test measurement invariance in the short versions of the scales comprised of only three items. We performed the analyses on the 12- and 5-item EDE-QS, 5-item SCOFF, 7-item DCQ, and 11-item WBIS scales. This allowed us to detect potential differences in the measurement properties of the larger scales that may impact the shorter versions.
Scale properties
We first explored scale properties by examining descriptive statistics. To test whether any differences exist in the sample on key measures among sex and age groups (i.e., 18–39 year olds vs. 40 + year olds), we ran independent samples t-tests. We also conducted 2 × 2 ANOVAs to explore any interactions across sex and age groups. The two participants who did not disclose their sex were excluded from the analyses where sex differences were tested. We used SPSS 27.0 to conduct these analyses. We used the Omega macro for SPSS [ 48 ] to test the internal consistency of the scales with McDonald’s omega total ( ω t ) coefficient [ 49 ].
Correlations
We conducted bivariate correlations between the long and short versions of the eating disorder and body dysmorphia, and between these measures and those of depression, anxiety, and psychological distress. This way, we were able to explore the equivalence in the rank ordering across the measures, along convergent and discriminant validity.
Measurement properties & item reduction
We first conducted a confirmatory factor analysis on the EDE-QS, SCOFF, DCQ, and WBIS scores. Based on these analyses, we created the short versions of the scales, relying on the items with the highest discrimination parameters (Figs. 1 , 2 , 3 and 4 ). The fit statistics of the full and shortened administered scales are presented in Table 1 , while the TIFs of the scalar models are presented in the Supplementary Materials. While RMSEA values were adequate in the cases of the long and short versions of the SCOFF, the 3-item WBIS, and the 3-item DCQ, the CFI and TLI scores indicated the remaining scales had a good fit as well. The only exception was the 12-item EDE-QS scale assessed in the overall sample rather than the young adult subsample. Nevertheless, this scale also showed an adequate fit.
The Results of a Confirmatory Factor Analysis (Standardized Coefficients) on the 12-Item EDE-QS in the ( A ) Full Sample and ( B ) Young Adult Subsample, and on the 5-Item EDE-QS in the ( C ) Full Sample and ( D ) Young Adult Subsample. Note The variance of the factors was fixed to 1 in all cases
The Results of a Confirmatory Factor Analysis (Standardized Coefficients) on the 5-Item SCOFF in the ( A ) Full Sample and ( B ) Young Adult Subsample, and on the 3-Item SCOFF in the ( C ) Full Sample and ( D ) Young Adult Subsample. Note The variance of the factors was fixed to 1 in all cases
The Results of a Confirmatory Factor Analysis (Standardized Coefficients) on the 7-Item DCQ in the ( A ) Full Sample and ( B ) Young Adult Subsample, and on the 3-Item DCQ in the ( C ) Full Sample and ( D ) Young Adult Subsample. Note The variance of the factors was fixed to 1 in all cases
The Results of a Confirmatory Factor Analysis (Standardized Coefficients) on the 11-Item WBIS in the ( A ) Full Sample and ( B ) Young Adult Subsample, and on the 3-Item WBIS in the ( C ) Full Sample and ( D ) Young Adult Subsample. Note The variance of the factors was fixed to 1 in all cases
Eating disorder measures
In the case of the 12-item EDE-QS [ 10 ], the three items with the highest loading did not match across the analysis conducted on the full sample and that conducted on the young adult subsample (Fig. 1 ). Our aim throughout the study was to develop short scales which are optimal for use both in a general UK population as well as the young adult population. This way, test-retest within a single cohort, as well as measurement harmonisation across different UK-based cohorts could be facilitated. For this reason, we chose items with the three highest loadings from both analyses, resulting in a five-item long scale. The final items were ‘On how many of the past 7 days has thinking about your weight or shape made it very difficult to concentrate on things you are interested in (such as working , following a conversation or reading)?’. ‘On how many of the past 7 days have you had a definite fear that you might gain weight?’ , ‘On how many of the past 7 days have you had a strong desire to lose weight?’ , ‘Over the past 7 days has your weight or shape influenced how you think about (judge) yourself as a person?’ , ‘Over the past 7 days , how dissatisfied have you been with your weight or shape?’ (Appendix A ). Footnote 3 These items cover a range of the characteristics of eating disorders, but do not include more clinically salient behaviours such as purging. This indicates that it may be an ideal measure to use among the general, rather than a clinical population.
In the case of the 5-item SCOFF [ 11 ], we selected the three items with the highest loadings, which matched across the analysis conducted on the full sample and that conducted on the young adult subsample. These items were ‘Do you make yourself sick because you feel uncomfortably full?’ , ‘Do you worry you have lost control over how much you eat?’. ‘Would you say that food dominates your life?’ (Fig. 2 , Appendix B). Note that the threshold (overall sample: item 1 = 1.63, item 2 = 0.51, item 3 = 0.99, item 4 = 1.09, item 5 = 0.74; young adult sample: item 1 = 1.39, item 2 = 0.34, item 3 = 0.86, item 4 = 0.88, item 5 = 0.61) of item 1 of the SCOFF suggested that though it may hold valuable information in a clinical sample, it may be less useful when assessed in the general population. Indeed, this item was endorsed the least number of times among both the overall sample (only 50 out of 975 participants responded ‘yes’) as well as the young adult sample (only 31 out of 375 participants responded ‘yes’). This corresponds to the content of the item, asking individuals about vomiting on purpose, which may be more applicable to clinical populations. For this reason, we explored a 3-item version of the SCOFF where item 1 was not included. These analyses, however, indicated that when exchanged to the next highest loading item, item 3, its loading in the three-item model was poor (overall sample: item 2 = 0.92, item 3 = 0.32, item 5 = 0.77; young adult sample: item 2 = 0.89, item 3 = 0.44, item 5 = 0.73). We thus retained the 3-item SCOFF which included item 1, despite its seemingly increased suitability for clinical populations.
Body dysmorphia measure
In the case of the 7-item DCQ [ 12 ], we selected the three items with the highest loadings, which matched across the analysis conducted on the full sample and that conducted on the young adult subsample. These were ‘Have you ever been very concerned about some aspect of your physical appearance?’ , ‘Have you ever spent a lot of time worrying about a defect in your appearance / bodily functioning?’ , ‘Have you ever spent a lot of time covering up defects in your appearance / bodily functioning?’ (Fig. 3 , Appendix C ).
Weight bias internalisation measure
A 3-item short version of the 11-item modified WBIS has previously been introduced [ 13 , 14 ]. The same three items were implicated by our analyses as those with the highest loading when taking the full sample into account. These were ‘I feel anxious about my weight because of what people might think of me’ , ‘Whenever I think a lot about my weight , I feel depressed’ , ‘I hate myself for my weight’ (Fig. 4 , Appendix D ). Although when we only included the young adult subsample in the analyses the results did not completely overlap with these, the loadings of the selected three items were nevertheless high. For this reason, and because the three-item version of the scale has already been introduced and used, we decided to keep these items across the remaining analyses.
Across two of the tested questionnaires (the EDE-QS and DCQ), the analyses revealed that none of the older adults (40 + years, either males, females, or both) in the present sample selected the most extreme response options. This could be resolved by grouping the two most extreme categories together among the response options of such items and thus creating an overall cluster with existing responses. However, to form meaningful comparisons, we would in this case have to cluster the responses of the young age group together as well. As the younger age group provided responses across all scales in all categories, this would lead to the loss of information. For the sake of retaining such information, we did not compare the sample across ages, and instead we only explored sex differences within the young adult sample.
Specifically, 0 males over 40 responded with ‘6–7 days’ to the questions ‘On how many of the past 7 days had thinking about food, eating or calories made it very difficult to concentrate on things you are interested in (such as working, following a conversation or reading)?’ (item 3), to ‘On how many of the past 7 days has thinking about your weight or shape made it very difficult to concentrate on things you are interested in (such as working, following a conversation or reading)?’ (item 4), or to ‘On how many of the past 7 days have you tried to control your weight or shape by making yourself sick (vomit) or taking laxatives?’ (item 7) in the 12-item EDE-QS. To item 7, only 1 woman over 40 chose the response ‘6–7 days’, and 0 women over 40 chose the response ‘3–5 days’. Since item 4 was also part of the shortened scale, we again only ran the analyses on the young adult group, testing for bias across sexes. Similarly, in response to the question ‘Have you ever been told by others / doctors that you are normal spite of you strongly believing that something is wrong with your appearance or bodily functioning?’ no women aged over 40 chose the option ‘Much more than other people’, while only a single male over 40 did in the 7-item DCQ. For this reason, we only ran the measurement invariance analysis on the young age group for this scale as well.
Across all remaining measures, we tested measurement invariance across sexes and age groups (i.e., 4 groups: males ages 18–38, females ages 18–39, males ages 40+, females ages 40+). For the sake of consistency, we also conducted all analyses only among the young adult group, comparing the responses of males and females. The results of the measurement invariance testing procedure are presented in Table 2 . Whereas RMSEA did not consistently indicate an adequate fit, the changes in the CFI and TLI indicated a good fit across all scales. In the case of the SCOFF scale, the analysis indicated that the residual covariance matrix was not positive definite when conducted on the full sample. Thus, the corresponding results should not be interpreted as the solution may not be valid. The binary nature of the items and the reduced sample sizes resulting from the multiple group approach likely led to estimation issues.
Descriptive statistics are presented in Table 3 . McDonald’s ω t suggests that internal consistency remained comparable after removing items from each scale. Note, however, that the omega total values for the SCOFF scale are lower than expected. This likely means that this scale is not measuring a unidimensional construct but rather picks on different things that are not extremely related to each other on average. This is true for both the 5-item scale as well as the 3-item scale, while the 3-item version actually has slightly higher omega values.
Independent samples t-tests revealed that females scored significantly higher than males on all eating disorder and body dysmorphia measures in the overall sample (Table 4 A) and in the young adult sample (Table 4 B). Younger adults (ages 18–39) also scored significantly higher on all measures than older adults (ages 40+, Table 4 C). 2 × 2 ANOVAs revealed a significant interaction between sex and age on the EDDS, BDDQ, and DCQ measures (Table 5 ). Specifically, the results revealed that the while women consistently scored higher than males across all scales and both age groups, this difference was greater among young adults than older adults. The results on all further measures (apart from the Z-transformed EDDS scores) were also in the same directions (Table 5 ). The measures of effect size are also included along the results of the t-tests (Cohen’s d ) and ANOVAs (η p 2 ).
Scores from all eating disorder, body dysmorphia, and weight bias internalisation scales were positively correlated with each other in the overall sample as well as the young adult subsample ( r s ranging from 0.40 to 0.96; Table 6 ). The short scales corresponding to their longer versions showed the strongest positive correlations ( r s ranging from 0.88 to 0.96), as expected. While the data we collected on depression, anxiety, and psychological distress are presented in detail elsewhere [ 9 ], bivariate correlations revealed that all these measures were significantly positively correlated to those assessing eating disorders, body dysmorphia, and weight bias internalisation ( r s ranging from 0.32 to 0.57). Lower psychological distress, lower depression, and lower anxiety were related to lower levels of eating disorders, lower levels of body dysmorphia, and lower levels of weight bias internalisation. These findings are presented on OSF (due to the large size of the correlation table, https://osf.io/vg4a9/ ). Despite the significant correlations across all measures, the pattern of correlations among scales designed to assess more closely related concepts (i.e., eating disorders with body dysmorphia and weight bias internalisation; depression with anxiety and psychological distress) are stronger with each other. These results further support the discriminant and convergent validity of the scales presented here.
Throughout the analyses presented in this manuscript, we developed short versions of existing, widely used measures of eating disorders and body dysmorphia. Specifically, we aimed to identify short sets of items which capture similar information and variance as do the full scales, and which correlate well with the full scales, but can be completed in less time. While using shorter measures may optimise data quality across research settings due to eliminating unnecessary confounds as fatigue and boredom among research participants [ 2 ], brevity may be especially important when it comes to asking participants about sensitive topics such as behaviours related to eating disorders [ 15 , 16 , 17 ]. Based on the analyses, we introduce here a 5-item short version of the 12-item EDE-QS [ 10 ], a 3-item short version of the 5-item SCOFF [ 11 ], and a 3-item short version of the 7-item DCQ [ 12 ]. We further explored the properties of the 11-item WBIS [ 14 ], a measure of weight bias internalisation, with our results supporting the validity of its recently introduced 3-item version [ 13 ].
The short version of each of the scales correlated strongly positively with their longer versions. Furthermore, the scales correlated positively as expected with alternative measures of eating disorders and body dysmorphia, though these correlations were somewhat weaker. Finally, all measures also correlated with measures of psychological distress, depression, and anxiety, but these correlations, although significant, were the weakest among those observed. As expected, these revealed that an increased presence of eating disorder, body dysmorphia, and weight bias internalisation symptoms were related to increased psychological distress, depression, and anxiety. These findings support the convergent and discriminant validity of the measures.
The short scales performed similarly to the longer versions across additional analyses. The results corroborated previous findings indicating a greater prevalence of eating disorders and body image concerns among females than males [ 50 , 51 , 52 ] and among younger compared to older individuals [ 53 , 54 , 55 ]. Consistently, the difference between men and women was greater in the young adult group then in the older adult group, although this interaction did not reach significance across all analyses.
We observed measurement invariance across age and sex groups in the 11-item WBIS scale and across sex groups among young adults in the 5-item SCOFF, 12- and 5-item EDE-QS, 11-item WBIS, and 7-item DCQ. This could not be formally tested across the scales with 3 or fewer items. We also refrained from testing it across the full sample on the 12- and 5-item EDE-QS and the 7-item DCQ. This was because on some items, cases existed where no older adults endorsed extreme response options. For example, 0 males over 40 and only 1 woman over 40 responded with the option ‘6–7 days’ to the question On how many of the past 7 days have you tried to control your weight or shape by making yourself sick (vomit) or taking laxatives? This likely reflects [ 1 ] the observed difference across age groups, indicating that indeed younger adults have a greater likelihood of experiencing symptoms of eating disorders and body dysmorphia and [ 2 ] that the data was collected among a nonclinical sample of adults where more extreme symptoms are rare. We chose not to conduct measurement invariance testing on these scales as in order to do so, we would have merged some response options together to ensure that each one has some participants endorsing it. To be able to compare the responses of older and younger adults, we would have had to also merge across the responses of younger adults, where such an issue with a lack of participants selecting a response option was not present, ultimately leading to a loss of information. Furthermore, the results from the 5-item SCOFF scale could not be interpreted across the full sample, likely caused by the binary responses and the reduced sample sizes caused by splitting across age and sex groups. Nevertheless, the lack of issues in at least one version of each scale, as indicated by measurement invariance testing, may suggest that the corresponding alternative versions of the same scales may also have invariant measurement properties across the same groups. The analyses further suggest that the short scales may provide a good approximation of the longer versions. This is also supported by the high loadings of the items of the short scales on the underlying latent variables.
The analyses presented here were limited by the nature of the short scales. Some of the analyses presented on the full scales could not be conducted on the short scales due to the number of items included. For example, measurement invariance testing cannot be implemented in scales with three or less items. In addition, the present analyses were conducted in a sample of UK adults. Based on the results presented here, we cannot be certain whether they would replicate in different cultural or national contexts. Finally, it should be noted that the results presented here were collected from the general population. We thus cannot make any conclusions based on these results about the performance of the scales in clinical populations.
As such, we urge future research to investigate the short measures presented here among clinical samples to further explore the contexts in which they may be used. It would further be desirable to test the performance of the short scales among groups known to differ in eating disorders and related pathologies from the general population, including lesbian, gay, bisexual, and transgender individuals [ 56 ] or gender-expansive individuals who identify outside the binary system of man or woman [ 57 ] and assess whether the differences across such groups established through the use of alternative measures replicates. Such results would further support the construct validity of the short measures.
This manuscript presents the short versions of eating disorder and body dysmorphia measures, specifically of the 12-item EDE-QS, the 5-item SCOFF, and the 7-item DCQ. It further investigates the properties of the already established 3-item version of the 11-item WBIS, a measure of weight bias internalisation. The analyses indicate that these short measures may perform comparably to their longer versions. The short measures may prove invaluable to research where the amount of time spent on any given measure is scarce, including cohort studies. Across research involving questionnaire-type measures, brevity may contribute to data quality as it eases response burden, and reduces the likelihood of participants experiencing fatigue or boredom effects.
Eating Disorder Examination Questionnaire [ 10 ].
12-item scale:
On how many of the past 7 days….
Have you been deliberately trying to limit the amount of food you eat to influence your weight or shape (whether or not you have succeeded)?
Have you gone for long periods of time (e.g., 8 or more waking hours) without eating anything at all in order to influence your weight or shape?
Has thinking about food, eating or calories made it very difficult to concentrate on things you are interested in (such as working, following a conversation or reading)?
Has thinking about your weight or shape made it very difficult to concentrate on things you are interested in (such as working , following a conversation or reading)?
Have you had a definite fear that you might gain weight?
Have you had a strong desire to lose weight?
Have you tried to control your weight or shape by making yourself sick (vomit) or taking laxatives?
Have you exercised in a driven or compulsive way as a mean of controlling your weight, shape or body fat, or to burn off calories?
Have you had a sense of having lost control over your eating (at the time that you were eating)?
On how many of these days (i.e., days on which you had a sense of having lost control over your eating) did you eat what other people would regard as an unusually large amount of food in one go?
Over the past 7 days….
Has your weight or shape influenced how you think about (judge) yourself as a person?
How dissatisfied have you been with your weight or shape?
Responses: Items 1–10: 0 days, 1–2 days, 3–5 days, 6–7 days; Items 11–12: Not at all, Slightly, Moderately, Markedly.
5-item scale: 4, 5, 6, 11, 12.
SCOFF [ 11 ].
5-item scale:
Do you make yourself sick because you feel uncomfortably full?
Do you worry you have lost control over how much you eat?
Have you recently lost more than one stone in a three month period?
Do you believe yourself to be fat when others say you are too thin?
Would you say that food dominates your life?
Responses: Yes/No.
3-item scale: 1, 2, 5.
Dysmorphic Concern Questionnaire [ 12 ].
7-item scale:
Have you ever:
Been very concerned about some aspect of your physical appearance?
Considered yourself to be misinformed or misshaped in some way (e.g., nose / hair / skin / sexual organ / overall body build)?
Considered your body to be malfunctional in some way (e.g., excessive body odour, flatulence, sweating)?
Considered or felt that you needed to consult a plastic surgeon / dermatologist / physician about these concerns?
Been told by others / doctors that you are normal spite of you strongly believing that something is wrong with your appearance or bodily functioning?
Spent a lot of time worrying about a defect in your appearance / bodily functioning?
Spent a lot of time covering up defects in your appearance / bodily functioning?
Responses: Not at all, Same as other people, More than most people, Much more than most people.
3-item scale: 1, 6, 7.
Modified Weight Bias Internalisation Scale [ 14 ].
11-item scale:
Because of my weight, I feel that I am just as competent as anyone.
I am less attractive than most people because of my weight.
I feel anxious about my weight because of what people might think of me.
I wish I could drastically change my weight.
Whenever I think a lot about my weight , I feel depressed.
I hate myself for my weight.
My weight is a major way that I judge my value as a person.
I don’t feel that I deserve to have a really fulfilling social life, because of my weight.
I am OK being the weight that I am.
Because of my weight, I don’t feel like my true self.
Because of my weight, I don’t understand how anyone attractive would want to date me.
Responses: 7-point Likert scale, 1 = Strongly disagree, 7 = Strongly agree.
3-item scale [ 13 ]: 3, 5, 6.
Note: items 1& 9 are reverse scored.
Data availability
All data and corresponding syntax files are available via OSF: https://osf.io/vg4a9/ .
Note that the preregistration did not include the plan to test measurement invariance.
To indicate their sex, participants were asked to respond to the question ‘Which of the following were you described as at birth?’ by selecting one of the following options: male , female , intersex , prefer not to say.
Note that item 9 of the EDE-QS may have performed well within the short scale, as indicated by the factor loadings. Due to human error, we overlooked this when conducting the analyses. The short questionnaire has been included in the Millennium Cohort Study’s 2023 data sweep as described in this manuscript. The near perfect correlations between the long and short EDE-QS suggest that regardless of this error, the short version captures an overlapping underlying construct.
Oates J, Carpenter D, Fisher M, Goodson S, Hannah, Kwiatkowski R, et al. BPS code of human research ethics. The British Psychological Society; 2021.
Rolstad S, Adler J, Rydén A. Response burden and questionnaire length: is shorter better? A review and Meta-analysis. Value Health. 2011;14(8):1101–8.
Article PubMed Google Scholar
Fernández-Ballesteros R. Self-report questionnaires. In: Haynes SN, Heiby EM, editors. Comprehensive handbook of psychological assessment: vol 3 behavioural assessment. Hoboken, NJ: Wiley; 2004. pp. 194–221.
Google Scholar
Paulhus DL, Vazire S. The self-report method. Handbook of research methods in personality psychology. New York: Guilford; 2007. pp. 224–39.
Stanton JM, Sinar EF, Balzer WK, Smith PC, ISSUES AND STRATEGIES FOR REDUCING THE LENGTH OF SELF-REPORT SCALES. Pers Psychol. 2002;55(1):167–94.
Article Google Scholar
Gardner DG, Cummings LL, Dunham RB, Pierce JL. Single-item Versus multiple-item Measurement scales: an empirical comparison. Educ Psychol Meas. 1998;58(6):898–915.
Joshi H, Fitzsimons E, The Millennium Cohort Study. : the making of a multi-purpose resource for social science and policy. Longitud Life Course Stud [Internet]. 2016 Oct 26 [cited 2022 Apr 7];7(4). http://www.llcsjournal.org/index.php/llcs/article/view/410
Connelly R, Platt L. Cohort Profile: UK Millennium Cohort Study (MCS). Int J Epidemiol. 2014;43(6):1719–25.
Lantos D, Moreno-Agostino D, Harris LT, Ploubidis G, Haselden L, Fitzsimons E. The performance of long vs. short questionnaire-based measures of depression, anxiety, and psychological distress among UK adults: a comparison of the patient health questionnaires, generalized anxiety disorder scales, malaise inventory, and Kessler scales. J Affect Disord. 2023;338:433–9.
Gideon N, Hawkes N, Mond J, Saunders R, Tchanturia K, Serpell L. Development and psychometric validation of the EDE-QS, a 12 item short form of the eating disorder examination Questionnaire (EDE-Q). Takei N, editor. PLoS ONE. 2016;11(5):e0152744.
Article PubMed PubMed Central Google Scholar
Morgan JF, Reid F, Lacey JH. The SCOFF questionnaire: assessment of a new screening tool for eating disorders. BMJ. 1999;319(7223):1467–8.
Mancuso SG, Knoesen NP, Castle DJ. The dysmorphic concern questionnaire: a screening measure for body dysmorphic disorder. Aust N Z J Psychiatry. 2010;100416054850067–8.
Kliem S, Puls HC, Hinz A, Kersting A, Brähler E, Hilbert A. Validation of a three-item short form of the Modified Weight Bias internalization scale (WBIS-3) in the German Population. Obes Facts. 2020;13(6):560–71.
Pearl RL, Puhl RM. Measuring internalized weight attitudes across body weight categories: validation of the Modified Weight Bias internalization scale. Body Image. 2014;11(1):89–92.
Matos M, Coimbra M, Ferreira C. When body dysmorphia symptomatology meets disordered eating: the role of shame and self-criticism. Appetite. 2023;186:106552.
Goss K, Allan S. Shame, pride and eating disorders. Clin Psychol Psychother. 2009;16(4):303–16.
Troop NA, Allan S, Serpell L, Treasure JL. Shame in women with a history of eating disorders. Eur Eat Disord Rev. 2008;16(6):480–8.
van de Schoot R, Kluytmans A, Tummers L, Lugtig P, Hox J, Muthén B. Facing off with Scylla and Charybdis: a comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Front Psychol [Internet]. 2013 [cited 2022 Apr 9];4. http://journal.frontiersin.org/article/ https://doi.org/10.3389/fpsyg.2013.00770/abstract
van de Schoot R, Schmidt P, De Beuckelaer A, Lek K, Zondervan-Zwijnenburg M, Editorial. Measurement Invariance. Front Psychol [Internet]. 2015 Jul 28 [cited 2022 Apr 9];6. http://journal.frontiersin.org/Article/ https://doi.org/10.3389/fpsyg.2015.01064/abstract
Hamamura T, Heine SJ, Paulhus DL. Cultural differences in response styles: the role of dialectical thinking. Personal Individ Differ. 2008;44(4):932–42.
Huang CD, Church AT, Katigbak MS. Identifying Cultural differences in items and traits: Differential Item Functioning in the NEO personality inventory. J Cross-Cult Psychol. 1997;28(2):192–218.
Reynolds CR, Altmann RA, Allen DN. The Problem of Bias in Psychological Assessment. In: Mastering Modern Psychological Testing [Internet]. Cham: Springer International Publishing; 2021 [cited 2022 Apr 7]. pp. 573–613. https://link.springer.com/ https://doi.org/10.1007/978-3-030-59455-8_15
Wicherts JM, Dolan CV, Hessen DJ, Oosterveld P, van Baal GCM, Boomsma DI, et al. Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence. 2004;32(5):509–37.
Hunt JR, White E. Retaining and tracking Cohort Study members. Epidemiol Rev. 1998;20(1):57–70.
Samet JM, Muñoz A. Evolution of the Cohort Study. Epidemiol Rev. 1998;20(1):1–14.
Stice E, Telch CF, Rizvi SL. Development and validation of the Eating Disorder Diagnostic Scale: a brief self-report measure of anorexia, bulimia, and binge-eating disorder. Psychol Assess. 2000;12(2):123–31.
Stice E, Fisher M, Martinez E. Eating Disorder Diagnostic Scale: additional evidence of reliability and validity. Psychol Assess. 2004;16(1):60–71.
Phillips K. The broken Mirror: understanding and treating body dysmorphic disorder. Oxford: Oxford University Press; 1998.
Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SLT, et al. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol Med. 2002;32(6):959–76.
Ploubidis GB, McElroy E, Moreira HC. A longitudinal examination of the measurement equivalence of mental health assessments in two British birth cohorts. Longitud Life Course Stud. 2019;10(4):471–89.
Rutter M, Tizard J, Whitmore K. Education, health and behaviour. Harlow: Longman; 1970. p. 474.
Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13.
Kroenke K, Spitzer RL. The PHQ-9: a New Depression Diagnostic and Severity measure. Psychiatr Ann. 2002;32(9):509–15.
Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: validity of a two-item Depression Screener. Med Care. 2003;41(11):1284–92.
Spitzer RL, Kroenke K, Williams JBW, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. 2006;166(10):1092.
Kroenke K, Spitzer RL, Williams JBW, Monahan PO, Löwe B. Anxiety disorders in Primary Care: prevalence, impairment, Comorbidity, and detection. Ann Intern Med. 2007;146(5):317.
Muthén LK, Muthén BO. MPlus user’s guide. 8th ed. Los Angeles, CA.: Muthén & Muthén; 1998.
Steiger JH. Structural model evaluation and modification: an interval Estimation Approach. Multivar Behav Res. 1990;25(2):173–80.
Bentler PM. Comparative fit indexes in structural models. Psychol Bull. 1990;107(2):238–46.
Tucker LR, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38(1):1–10.
Hu Ltze, Bentler PM. Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3(4):424–53.
Barrett P. Structural equation modelling: adjudging model fit. Personal Individ Differ. 2007;42(5):815–24.
Betz NE, Turner BM. Using item response theory and adaptive testing in Online Career Assessment. J Career Assess. 2011;19(3):274–86.
Armstrong BG. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998;55(10):651–6.
Little TD. Longitudinal structural equation modeling. New York: The Guilford Press; 2013. p. 386. (Methodology in the social sciences).
Chen FF. Sensitivity of goodness of fit indexes to lack of Measurement Invariance. Struct Equ Model Multidiscip J. 2007;14(3):464–504.
Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for Testing Measurement Invariance. Struct Equ Model Multidiscip J. 2002;9(2):233–55.
Hayes AF, Coutts JJ. Use Omega rather than Cronbach’s alpha for estimating reliability. But… Commun Methods Meas. 2020;14(1):1–24.
McDonald RP. Test Theory: A Unified Treatment [Internet]. 0 ed. Psychology Press; 1999 [cited 2022 Apr 6]. https://www.taylorfrancis.com/books/9781135675318
Culbert KM, Sisk CL, Klump KL. A narrative review of sex differences in eating disorders: is there a biological basis? Clin Ther. 2021;43(1):95–111.
Swanson SA, Crow SJ, Le Grange D, Swendsen J, Merikangas KR. Prevalence and correlates of eating disorders in adolescents: results from the National Comorbidity Survey Replication adolescent supplement. Arch Gen Psychiatry. 2011;68(7):714.
Udo T, Grilo CM. Prevalence and correlates of DSM-5–Defined eating disorders in a nationally Representative Sample of U.S. adults. Biol Psychiatry. 2018;84(5):345–54.
Brandsma L. Eating disorders across the Life Span. J Women Aging. 2007;19(1–2):155–72.
Koran LM, Abujaoude E, Large MD, Serpe RT. The prevalence of body dysmorphic disorder in the United States Adult Population. CNS Spectr. 2008;13(4):316–22.
Wells JE, Oakley Browne MA, Scott KM, McGee MA, Baxter J, Kokaua J, Prevalence. Interference with life and severity of 12 Month DSM-IV disorders in Te Rau Hinengaro: the New Zealand Mental Health Survey. Aust N Z J Psychiatry. 2006;40(10):845–54.
Parker LL, Harriger JA. Eating disorders and disordered eating behaviors in the LGBT population: a review of the literature. J Eat Disord. 2020;8(1):51.
Nagata JM, Compte EJ, Cattle CJ, Flentje A, Capriotti MR, Lubensky ME, et al. Community norms for the eating disorder examination Questionnaire (EDE-Q) among gender-expansive populations. J Eat Disord. 2020;8(1):74.
Download references
The project was funded by the CLS Resource Centre Grant, funded by ESRC (funder award reference: ES/M001660/1). DM-A is part supported by the ESRC Centre for Society and Mental Health at King’s College London [ES/S012567/1]. The views expressed are those of the authors and not necessarily those of the ESRC or King’s College London.
Author information
Authors and affiliations.
UTS Business School, University of Technology Sydney, Ultimo, Australia
Dorottya Lantos
Centre for Longitudinal Studies, Social Research Institute, UCL, London, UK
Darío Moreno-Agostino, George Ploubidis, Lucy Haselden & Emla Fitzsimons
Department of Experimental Psychology, UCL, London, UK
Lasana T. Harris
ESRC Centre for Society and Mental Health, King’s College London, London, UK
Darío Moreno-Agostino
You can also search for this author in PubMed Google Scholar
Contributions
DL was responsible for developing the online survey, collecting and analysing data, and preparing the original draft of this manuscript. DM-A and GP contributed to the interpretation of data analyses. LH contributed to the selection of instruments and sampling. EF, LTH, and GP conceptualized the project and supervised data collection and analyses. All authors contributed to the writing process by providing critical comments and through editing.
Corresponding author
Correspondence to Emla Fitzsimons .
Ethics declarations
Ethical approval.
This study received ethical approval from the ethics committee of University College London. All participants gave consent to participate in the study and to the use of their data in scientific publication.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Material 1
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Cite this article.
Lantos, D., Moreno-Agostino, D., Harris, L.T. et al. The validation of short eating disorder, body dysmorphia, and Weight Bias Internalisation Scales among UK adults. J Eat Disord 12 , 137 (2024). https://doi.org/10.1186/s40337-024-01095-9
Download citation
Received : 04 August 2023
Accepted : 28 August 2024
Published : 09 September 2024
DOI : https://doi.org/10.1186/s40337-024-01095-9
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Measurement
- Eating disorder
- Body dysmorphia
- Weight bias internalisation
- Questionnaire optimisation
Journal of Eating Disorders
ISSN: 2050-2974
- General enquiries: [email protected]
- Open access
- Published: 07 September 2024
unmconf : an R package for Bayesian regression with unmeasured confounders
- Ryan Hebdon 1 ,
- James Stamey 1 ,
- David Kahle 1 &
- Xiang Zhang 2
BMC Medical Research Methodology volume 24 , Article number: 195 ( 2024 ) Cite this article
12 Altmetric
Metrics details
The inability to correctly account for unmeasured confounding can lead to bias in parameter estimates, invalid uncertainty assessments, and erroneous conclusions. Sensitivity analysis is an approach to investigate the impact of unmeasured confounding in observational studies. However, the adoption of this approach has been slow given the lack of accessible software. An extensive review of available R packages to account for unmeasured confounding list deterministic sensitivity analysis methods, but no R packages were listed for probabilistic sensitivity analysis. The R package unmconf implements the first available package for probabilistic sensitivity analysis through a Bayesian unmeasured confounding model. The package allows for normal, binary, Poisson, or gamma responses, accounting for one or two unmeasured confounders from the normal or binomial distribution. The goal of unmconf is to implement a user friendly package that performs Bayesian modeling in the presence of unmeasured confounders, with simple commands on the front end while performing more intensive computation on the back end. We investigate the applicability of this package through novel simulation studies. The results indicate that credible intervals will have near nominal coverage probability and smaller bias when modeling the unmeasured confounder(s) for varying levels of internal/external validation data across various combinations of response-unmeasured confounder distributional families.
Peer Review reports
Introduction
Estimating the causal relationship between an exposure/treatment variable and a desired outcome is often of general interest in observational studies. While randomized clinical trials are recognized as the gold standard in investigating this relationship, observational studies have long been important in healthcare research. Since the subjects are not randomized to treatments, addressing problems due to selection bias and unmeasured confounding are vital for making appropriate inferences [ 1 , 2 ]. For the issue of unmeasured confounding, researchers in non-randomized studies often claim the ignorability assumption (i.e., causal framework can only address bias due to measured/observed confounders), where it is assumed any bias from unmeasured confounding is negligible. This dismissal can lead to bias in parameter estimates and erroneous conclusions [ 3 , 4 , 5 , 6 ].
Sensitivity analysis, or quantitative bias analysis (QBA), provide tools to account for the potential existence of unmeasured confounders in observational research [ 7 , 8 , 9 ]. Under QBA, the investigator may choose to use a deterministic or probabilistic approach to quantify the potential direction and impact of the bias. Deterministic QBA specifies a range of values for bias parameters, \(\varvec{\phi }\) , and then calculates the bias-adjusted estimate of the exposure effect assuming \(\varvec{\phi }\) for all combinations of the specified values of \(\varvec{\phi }\) . A bias parameter is defined here to be any parameter that explains the association between the unmeasured confounder and another variable. Probabilistic QBA, on the other hand, involves the analyst’s explicit modeling of probable occurrences of various combinations of \(\varvec{\phi }\) . The bias parameters are assigned a joint probability distribution (a joint prior distribution in Bayesian terminology) to depict the analyst’s uncertainty in the true value of the bias parameters. Monte Carlo sensitivity analysis (MCSA), considered a partial Bayesian analysis and a simple approach to probabilistic QBA, iteratively samples the bias parameters from the joint distribution and fits those sampled values into the same sensitivity analysis formulas used in a fixed-value analysis [ 10 ]. Bayesian sensitivity analyses have also been proposed, where the bias parameters are assumed to be unknown and are assigned mildly informative priors in order to avoid non-identifiable models [ 11 , 12 ]. These methods all rely on additional sources of information on the relationships between the unmeasured confounder and both the response and exposure variables. This information can be from internal validation data, external validation data, or elicited from subject matter experts.
Adoption of sensitivity analysis methods has been slow due to the lack of easily accessible software as well as a focus of methods for binary outcomes and exposures. To aid in the awareness of software developed, a literature search of available software for sensitivity analysis on unmeasured confounders was performed on articles published since 2010 [ 13 ]. Twelve packages were implemented in R . These packages, such as treatSens [ 14 , 15 ], causalsens [ 16 ], sensemakr [ 17 ], EValue [ 18 ], and konfound [ 19 ], implement deterministic QBA. The R package, episensr , appears to be the only known package that performs probabilistic QBA via MCSA to account for unmeasured confounders [ 20 ]. However, only summary-level data can be supplied to this package. With record-level analysis missing from the package, Fox et al. [ 21 ] later adopt MCSA for record-level data and provide modifiable R scripts in a supplementary appendix to be tailored to an analyst’s data set. They end their discussion with an encouragement to the user to explore formal Bayesian approaches to overcome some obstacles in their MCSA code.
To address these limitations, we have developed an R package called unmconf that uses a Bayesian regression framework to assess the impact of unmeasured confounding in observational studies. To our knowledge, unmconf implements the first available package for a fully Bayesian sensitivity analysis and expands beyond the events of binary outcomes and exposures. unmconf is available through CRAN at https://cran.r-project.org/web/packages/unmconf/index.html . Bayesian sensitivity analysis is often viewed as difficult to implement due to requiring special Markov Chain Monte Carlo (MCMC) software packages and checking convergence of the fitted model. unmconf overcomes these common challenges through a handful of user friendly functions that resembles the glm() framework on the front end. The package requires that the user has Just Another Gibbs Sampler (JAGS) installed on their computer, but a user does not need to be proficient in this software.
The introduced package can facilitate sensitivity analyses by leveraging informative priors to explore the influence of unknown unmeasured confounders. Should validation data be accessible, the package enables users to adjust inferences for either one or two unmeasured confounders. In cases where more than two unmeasured confounders are present, the package generates editable JAGS code, offering scalability to accommodate additional confounders. In “ Methods ” section, we briefly review the statistical model behind unmconf . “ Example ” section compares our model’s results to the MCSA found in Fox et al. [ 21 ]. With this new software, we provide simulation studies to elaborate on the applicability of the package in “ Simulation ” section. Lastly, in “ Discussion ” section, we discuss the strengths and limitations of the underlying model in unmconf .
Motivating example
Consider a study where interest is in the relationship between body mass index (BMI) and hypertension among adults, with age, gender, and cholesterol level as other covariates recorded in the study. For the case where \(n = 1000\) , BMI, hypertension, age, and gender are fully observed for all subjects but cholesterol is only tracked for 10% of observations. If one wished to perform causal inference using glm() in R , the model would estimate the regression coefficients with a message in the summary saying, “900 observations deleted due to missingness”. A summary will output the results, but the conclusions drawn from the subset of 100 observations may or may not accurately capture the true, latent relationship between hypertension and BMI. Additional information through the other observations are discarded due to the missing data in cholesterol. Rather than discarding the data, unmconf estimates cholesterol as part of the overall Bayesian unmeasured confounding model. Of course, in some cases the information on cholestorol may come from other sources, either data or expert opinion, and the standard glm() function cannot address the unmeasured confounder at all.
The choice of methodologies for addressing unmeasured confounders relies on the availability of information regarding the unmeasured confounders and the objectives of assessing such confounding. The level of information can vary, ranging from expert opinion to either internal or external data sources. Internal validation is available when the unmeasured confounder is ascertained for a, typically, small subset of the main study data. In certain cases, only internal validation data may be accessible and information on the unmeasured confounder is only known for a subset of the patients. Provided that the internal validation data is a well representative sample, a small subsample of the unmeasured confounder variable can effectively calibrate estimation in Bayesian analysis [ 22 , 23 ]. External validation data is data from previous studies where the unmeasured confounder is fully observed. External data methods make the cautious assumption of transportability across study populations in comparing them to the main study. The external data often captures the relationship between the unmeasured confounder and the response yet lacks information on the exposure. A combination of external data and mildly informative priors can calibrate the estimation through the transportability assumption. unmconf handles both of these events for up to two unmeasured confounders. The Bayesian approach we apply here is related to missing data methods where the unmeasured confounder is treated as missing data and is imputed similar to missing at random (MAR) models.
Bayesian unmeasured confounding model
For the statistical model, we denote the continuous or discrete response Y , the binary main exposure variable X , the vector of p other perfectly observed covariates \(\varvec{C}\) , and the unmeasured confounder(s) relating to both Y and X U . In the event of more unmeasured covariates, we denote them \(U_{1}\) , \(U_{2}\) , and so forth; these unmeasured confounders can be either binary or normally distributed.
In the scenario where there is a single unmeasured confounder, Lin et al. [ 24 ] suggest the factorization \(f(y, u|x, \varvec{c}) = f(y|x, u, \varvec{c}) f(u|x, \varvec{c})\) , which yields
where \(\varvec{v}' = [x~\varvec{c}']'\) denotes the vector of the main exposure variable and all of the perfectly observed covariates. ( 1 ) is defined as a Bayesian unmeasured confounding model with one unmeasured confounder, where the distribution for Y pertains to the response model and the distribution for U pertains to the unmeasured confounder model. This model is completed by the specification of a link function \(g_*\) and some family of distributions \(D_{*}\) . Additional parameters for certain distributions–if any–are denoted \(\xi _{y}\) and \(\xi _{u}\) . Examples of these would be \(\sigma ^2\) for the variance of a normal distribution or \(\alpha\) for the shape parameter in the gamma distribution. For the cases of binomial and Poisson distributions, these parameters are absent. unmconf allows the user to work with a response from the normal, Poisson, gamma, or binomial distribution and unmeasured confounder(s) from the normal or binomial distribution. The package supports the identity (normal), log (Poisson or gamma), and logit (Bernoulli) link functions. Here we build a conditional distribution model on Y given treatment, perfectly observed confounders, and unmeasured confounders in addition to a marginal distribution model on U given treatment and measured confounders. The joint modeling is able to provide an adjusted estimate of the treatment-outcome effect [ 12 , 22 , 23 , 25 ]. This goes beyond previous works only allowing for either a binary or continuous response [ 13 ].
unmconf also extends beyond previous software packages by allowing for a second unmeasured confounder. For the second unmeasured confounder, the joint distribution can be factorized as \(f(y, u_{1}, u_{2}|x, \varvec{c}) = f(y|x, \varvec{c}, u_{1}, u_{2}) f(u_{1}|x, \varvec{c}, u_{2}) f(u_{2}|x, \varvec{c})\) , giving the Bayesian unmeasured confounding model:
where again \(\varvec{v}' = [x~\varvec{c}']'\) so that the coefficients of all perfectly observed covariates in the response model are \(\beta\) ’s, and \(\varvec{\lambda }\) denotes the coefficients on the unmeasured confounders in the response model aggregated together as \(\varvec{u} = [u_{1} \ u_{2}]'\) . For the first unmeasured confounder model in ( 2 ), the coefficients on all perfectly observed covariates are \(\gamma\) ’s, and \(\zeta\) denotes the coefficient on the second unmeasured confounder in this model. The remaining parameters in \(\varvec{\delta }\) denote the relationship between \(U_2\) and perfectly known covariates. Any parameter that models an association with \(U_{1}\) or \(U_{2}\) is defined as a bias parameter and requires either validation data or an informative prior to be estimated. The bottom two equations in ( 2 ) do not require conditional dependence between the two unmeasured confounders but rather grants the user flexibility to apply knowledge of one unmeasured confounder to estimate the other. Likewise, a multivariate distribution for \(U_1, U_2 | x, \varvec{c}\) would align with this framework. If a multivariate distribution is desired on \(U_1, U_2\) , the user would need to edit the JAGS code provided by unmconf . This same construction is generalizable to an arbitrary number of unmeasured confounders, the notation simply becomes a bit more cumbersome. We introduce the concept here, however, because the scenario of two unmeasured confounders is not uncommon, shows the general character of the construction, and illustrates how the implementation provided by unmconf works. The model in ( 2 ) is referenced throughout this paper. Given that we are interested in the exposure effect on Y conditional on \(X, \varvec{C}, U\) , \(\beta _x\) is the primary parameter of interest for this study.
Prior distributions
Prior distributions for the model parameters will be jointly defined as \(\pi (\varvec{\theta })\) , where \(\varvec{\theta } = (\varvec{\beta }, \varvec{\lambda }, \varvec{\gamma }, \zeta , \varvec{\delta })\) with bias parameters \(\varvec{\phi } = (\varvec{\lambda }, \varvec{\gamma }, \zeta , \varvec{\delta }, \xi _{u_{1}}, \xi _{u_{2}})\) . The bias parameters in an unmeasured confounder model can only be inferred through either validation data or informative priors. The default prior structure in unmconf is in Table 1 . The regression coefficients have a relatively non-informative prior with a mean of 0 and precision (inverse of the variance) of 0.1 when the response is discrete. When the response is continuous, the regression coefficients have a relatively non-informative prior with a mean of 0 and precision of 0.001. To further customize the analysis, users can specify custom priors using the priors argument within the modeling function, unm_glm() . The format for specifying custom priors is c("{parameter}[{covariate}]" = "{distribution}") . Example code eliciting informative priors is provided in “ Example ” section.
Families specified for the response and unmeasured confounder(s) may present nuisance parameters, necessitating the inclusion of their prior distributions as well. The precision parameter, \(\tau _*\) , on a normal response or normal unmeasured confounder will have a Gamma(0.001, 0.001) as the default prior. Priors can also be elicited in terms of \(\sigma _*\) or \(\sigma _*^2\) through priors . The nuisance parameter, \(\alpha _y\) , for a gamma response has a gamma distribution as the prior with both scale and rate set to 0.1. The aforementioned nuisance parameters are tracked and posterior summaries are provided as a default setting in the package, but this can be modified.
Posterior inference
When data is observed, scalar-valued objects in ( 2 ) become vectors and vectors become matrices, one for each response indexed \(i = 1, \ldots , n\) . Thus, data is constituted of the observed response values \(\varvec{y}\) , exposure values \(\varvec{x}\) , covariates \({\textbf {C}}\) . The exposure values and covariates can be combined columnwise to form the matrix \({\textbf {V}}\) whose rows correspond to the same observation. Similarly, the unmeasured or partially unmeasured confounders would be denoted \(\varvec{u}_{1}\) and \(\varvec{u}_{2}\) , perhaps aggregated together as the columns of a matrix \({\textbf {U}}\) . Combining the Bayesian unmeasured confounding model with two unmeasured confounders in ( 2 ) with \(\pi (\varvec{\theta })\) , we get the joint posterior distribution,
We sample from this posterior distribution via Gibbs sampling to obtain approximate marginal posterior distributions for the parameters of interest. For modeling purposes, unmeasured confounders can be viewed as missing variables and thus are treated as parameters in the Bayesian paradigm. Bayesian computation in a missing data problem is based on the joint posterior distribution of the parameters and missing data conditioned on the modeling assumptions and observed data [ 26 ]. Using ( 2 ), we compute the joint posterior of \((\varvec{\beta }, \varvec{\lambda }, \varvec{\gamma }, \zeta , \varvec{\delta }, \varvec{u}_{1\text {mis}}, \varvec{u}_{2\text {mis}})\) given the observed \((\varvec{y}, \varvec{x}, \varvec{c}, \varvec{u}_{1\text {obs}}, \varvec{u}_{2\text {obs}})\) , where the subscripts “mis” and “obs” indicate the values that were missing and observed, respectively. The posterior simulation then uses two or three Gibbs sampling steps, depending on the number of unmeasured confounders.
We illustrate the applicability of our package by first considering a simple example found in Fox et al. [ 21 ]. We consider the data provided in their paper, modeling a binary unmeasured confounder, binary response, and binary exposure. Summary-level data from the paper is displayed in Table 2 . No additional measured confounders are accounted for. The relationship between the exposure and response, accounting for the presence of an unmeasured confounder, is of interest.
Fox et al. [ 21 ] define \(P(C+|E+)\) as the probability of the unmeasured confounder among those with the exposure, \(P(C+|E-)\) as the probability of the unmeasured confounder among those without the exposure, and \(RR_{CE}\) as the relative risk. They set \(P(C+|E+) \sim \text {Beta}(10, 20)\) , \(P(C+|E-) \sim \text {Beta}(5, 20)\) , and \(RR_{CE} \sim \text {trapezoidal}(\text {min} = 1.5, \text {mod1} = 1.7, \text {mod2} = 2.3, \text {max} = 2.5)\) to be the distributions on the bias parameters. The algorithm for the probabilistic QBA by Fox et al. [ 21 ] is detailed in their paper. They base their Monte Carlo sensitivity analysis \(m = 100,000\) times on record-level data and results are summarized by the median, 2.5th percentile, and 97.5th percentile of the distribution in terms of a relative ratio. The run time was 11 minutes. We note that the summary-level code by Fox et al. [ 21 ] runs much faster, but we use their record-level code for comparison with unmconf given that model is built on a regression framework.
We apply the Bayesian unmeasured confounding model in ( 1 ) using unmconf . The distributions on Y and U are Bernoulli with no \(\varvec{C}\) covariates. This simplifies to
Fox et al. [ 21 ] generated the unmeasured confounder through sampling from the bias parameters’ distributions. With U completely missing in our model, informative priors are needed on the bias parameters to converge to meaningful results. We use the information from Fox et al. [ 21 ] and apply the conditional means prior approach of [ 27 , 28 ] to determine priors for the bias parameters. Note that the package uses the parameterization of JAGS for the normal, that is, mean and precision, where precision is the reciprocal of the variance. Using the \(P(C+|E-) \sim \text {Beta}(5, 20)\) and \(P(C+|E+) \sim \text {Beta}(10, 20)\) , we induce \(\gamma _{1} \sim N(-1.5, \tau _{\gamma _{1}} = 3.7)\) and \(\gamma _x \sim N(.747, \tau _{\gamma _x} = 2.31)\) priors for the unmeasured confounder/exposure model. We also require a prior for \(\lambda\) . Using \(RR_{CE} \sim \text {trapezoidal}(\text {min} = 1.5, \text {mod1} = 1.7, \text {mod2} = 2.3, \text {max} = 2.5)\) , we induce \(\lambda \sim N(0.99, \tau _{\lambda } = 26.70)\) .
We fit the model using our package across 4 chains, each with 25,000 iterations of which 4,000 were burn-in for a total of 100,000 posterior draws. We note that the MCMC algorithm does not require 25,000 iterations for the sake of convergence. When comparing time, we set the iterations to be the same for the posterior sampling as Fox et al. [ 21 ] did for MCSA. This took about 50 seconds compared to the 11 minutes from their code. The 2.5th percentile, median, and 97.5th percentile from their simulation along with the 95% credible interval and posterior median from unmconf are displayed in Table 3 in terms of the odds ratio. Despite using a trapezoidal for the risk ratio for Fox et al. [ 21 ] versus the normal distribution for the logistic regression parameter in our model, inferences are very similar. We add the code to convey the user friendliness of this package.
To investigate whether priors on the non-bias parameters impacted the width of the intervals, we performed the analysis again with a precision of 0.01 on the regression parameters’ priors instead of the default precision of 0.1 for the Bayesian model. Similarly, fitting the model via MCMC across 4 chains with 25,000 iterations of which 4,000 were warm-up, the model again took about 56 seconds. The 2.5th percentile, median, and 97.5th percentile from the Fox et al. [ 21 ] simulation are displayed in Table 3 , in terms of the odds ratio, along with the 95% credible interval and posterior median from unmconf . As expected, the 95% credible interval is slightly wider with the more diffuse priors.
To see if the two methods have similar results for larger samples, we multiplied the counts in Table 2 by 10. Using this larger record-level data, we again used the Fox et al. [ 21 ] code with 100,000 iterations. The analysis took approximately 40 minutes to run. The analysis using unmconf , with 25,000 iterations and a burn in of 4,000 across 4 chains took just under 20.5 minutes to run. The results from both packages are displayed in Table 3 . Not surprisingly, for this larger data set, the 95% intervals are more similar than in the small sample case.
The code supplied by Fox et al. [ 21 ] is currently structured to handle the case of modeling with no measured covariates. If a researcher’s data matches this code structure, then there is no concern from the user to conduct a sensitivity analysis. If other covariates are in a data set, then the authors assume that the researchers interested in conducting a simulation are proficient enough with R to modify the scripts in order to match their data. This may not always be the case. Our framework for modeling in unmconf allows for user ease in adjusting the model as needed and adding measured covariates. The modeling function, unm_glm() , carries a structure much like glm() in R , and the researchers can simply add measured covariates to the right hand side of the equation if desired.
A useful aspect of the unmconf package is that it provides a convenient framework to perform simulation studies to determine how large validation sample sizes need to be. In practice, these sizes might be fixed, but researchers could see in advance how precise inference will be and if supplementing with other external information might be needed. To assess the performance of the proposed Bayesian unmeasured confounding model with different levels of information on unmeasured confounder(s), coverage probabilities, average bias, and average lengths of 95% quantile-based credible intervals were compared against a naive model for simulated data sets. The naive model here ignores all unmeasured confounders and is of the form
In logistic regression, we note that the difference between the parameter estimate from the naive model and the parameter estimate from the confounder-adjusted model can come from a combination of confounding bias and the noncollapsibility effect [ 29 ]. To avoid confusion of what is meant by bias we consider the difference between the conditional odds ratio of the estimated model (either naive or corrected) and the “true” conditional odds ratio for the full model. For further understanding of noncollapsibility and quantifying the bias that derives from confounding bias in logistic regression models, refer to Schuster et al. [ 30 ], Pang et al. [ 31 ], and Janes et al. [ 32 ]. We compare the Bayesian unmeasured confounding model and the naive model under the presence of internal and external validation data.
Sensitivity analysis – internal validation data
Performance metrics of coverage, bias, and length were assessed for combinations of \(n = 500, 1000, 1500, 2000\) , internal validation data of \(p = 15\%, 10\%, 5\%, 2\%\) , \(\text {response} = \texttt {norm}, \texttt {bin}, \texttt {gam}, \texttt {pois}\) , \(u_{1} = \texttt {norm}, \texttt {bin}\) , and \(u_{2} = \texttt {norm}, \texttt {bin}\) across \(m = 1000\) data sets, where p denotes the fraction of the main study used for validation. The data is generated as follows. First, a single, perfectly measured covariate, z , was modeled as a standard normal for the desired sample size, n . The unmeasured confounders were then generated independently as either a normal distribution with a mean and variance of 1 or a Bernoulli distribution with probability of success being 0.7 depending on the family requested in the simulation. The binary exposure variable is generated conditioned on the unmeasured confounders and perfectly measured covariate with the parameters \(\varvec{\theta }_E = (\eta _{1} = -1, \eta _z = .4, \eta _{u_{1}} = .75, \eta _{u_{2}} = .75)\) in the inverse logit link of the exposure model. Lastly, the response model is either generated normally with variance of 1, Bernoulli, Poisson, or Gamma with shape parameter \(\alpha _y = 2\) , given the requested family, with parameters \(\varvec{\theta }_R = (\beta _{1} = -1, \beta _x = .75, \beta _{z} = .75, \lambda _{u_{1}} = .75, \lambda _{u_{2}} = .75)\) in the inverse link function. Thus, we have the design points \(\varvec{\theta } = (\varvec{\theta }_R, \varvec{\theta }_E)\) for our data generation. Once all variables are modeled for our data sets, we denote a proportion, \(1 - p\) , of the unmeasured confounders observations as missing. Our JAGS model results are based on 40000 posterior samples across 4 chains with a burn in of 4000. The algorithm for this simulation study is as follows:
Select parameter values for \(\varvec{\theta }\) .
Generate m data sets on combinations of \(n, p, u_{1}, u_{2}\) using runm() for selected parameter values.
Build the model and call JAGS using unm_glm() with n.iter = 10000, n.adapt = 4000, n.thin = 1, n.chains = 4 .
For each data set, monitor all parameters and whether the “true” parameters are contained in the 95% credible intervals.
Compute the average posterior mean, average posterior standard error, coverage, average bias, and average credible interval length across replications.
Assess convergence of the Bayesian model.
We note that, in step 1 above, using either validation data or informative priors to set initial values for the bias parameters on the “correct side of 0” leads to better results in terms of convergence. The simulation fits models with default priors, thus no informative priors. Output is only displayed for a binary response and two binary unmeasured confounders. We leave the Bayesian sensitivity analysis for all other combinations of responses (normal, binary, Poisson, gamma) and unmeasured confounders (normal, binary) as supplemental material.
For all levels of internal validation investigated, we obtain near nominal coverage. The coverage for the naive model is generally well below nominal, as noted in Table 4 . For small sample sizes and small internal validation proportions, the average bias is similar to the naive model. Rarely a study is performed with only 2% internal validation for a sample size of \(n = 500\) being low, but we choose to incorporate the results to display that the model may not be robust under extreme circumstances. The credible interval length largely increases with the decrease in proportion of internal validation data acquired, which demonstrates why smaller validation samples have higher coverage in our study. For internal validation data at 2%, the 95% credible intervals may not be of practical significance due to the range of values covered when the true value of \(\beta _x\) is 0.75. For instance, the average length of the 95% credible intervals when the response is binary, the two unmeasured confounders are binary, and the sample size is 500 was 3.601. As the sample size increases and internal validation data increases for any combination of response and unmeasured confounders, the bias shows a trend of decreasing towards zero. For these parameters values of \(\varvec{\theta }\) , the naive model overestimates the truth to a larger magnitude relative to any instance where the unmeasured confounders are accounted for in the model. This may not be the same direction of bias for other values of \(\varvec{\theta }\) . The median credible interval length and median bias are also displayed in Table 4 .
Sensitivity analysis – external validation data
Performance metrics of coverage, bias, and length were assessed for combinations of \(n = 500, 1000, 1500, 2000\) , external validation data with \(p = 50\%, 100\%\) of the original sample size, \(\text {response} = \texttt {norm}, \texttt {bin}, \texttt {gam}, \texttt {pois}\) , \(u_{1} = \texttt {norm}, \texttt {bin}\) , and \(u_{2} = \texttt {norm}, \texttt {bin}\) across \(m = 1000\) data sets. For the simulation with external validation data, the main study data has no information on the unmeasured confounders. Typically, the external data has information on the unmeasured confounder-response relationship but not the exposure. Thus, informative priors on the bias parameters for this relationship are needed to achieve convergence. That is, using ( 2 ), \(\gamma _{1}, \gamma _x, \delta _{1}\) , and \(\delta _x\) . For the sensitivity analysis, we again model z from a standard normal for the desired sample size, n . The unmeasured confounders were then generated independently as either a standard normal or a Bernoulli distribution with probability of success being 0.5 depending on the family requested in the simulation. Using the notation for the design points, \(\varvec{\theta }\) , in “ Sensitivity analysis – internal validation data ” section, \(\varvec{\theta }_E = (\eta _{1} = -1, \eta _z = .4, \eta _{u_{1}} = .75, \eta _{u_{2}} = .75)\) and \(\varvec{\theta }_R = (\beta _{1} = -1, \beta _x = .75, \beta _{z} = .75, \lambda _{u_{1}} = .75, \lambda _{u_{2}} = .75)\) . Once all variables are modeled for our data sets, we designate a proportion, \(1 - p\) , of the unmeasured confounders observations as missing. We additionally generate external validation data of size np where there is no treatment effect in \(\varvec{\theta }_E\) (i.e., \(\beta _x = 0\) ). Then, we combine the main study data and external validation data into one large data set. Our JAGS model results are based on 40000 posterior samples across 4 chains, with a burn in of 4000. The algorithm for this simulation study is as follows:
Elicit informative priors on \(\gamma _X\) and \(\delta _X\) .
Build the model and call JAGS using unm_glm() with n.iter = 10000, n.adapt = 4000, n.thin = 1, n.chains = 4 . Specify the informative priors through the argument, priors .
In the algorithm above, the unmeasured confounders were either \(u_* \sim N(0, 1)\) or \(u_* \sim \text {Bernoulli(0.5)}\) depending on the family requested in the simulation. The simulation fits models with default priors and no informative priors beyond those for \(\gamma _X\) and \(\delta _X\) . Here, \(\gamma _X \sim N(.65, \sigma ^2 = 1/.3^2)\) and \(\delta _X \sim N(.65, \sigma ^2 = 1/.3^2)\) . We again leave the the Bayesian sensitivity analysis for all other combinations of responses and unmeasured confounders as supplemental material.
For all levels of external validation investigated, we obtain near nominal coverage. The slight over-coverage likely derives from the increase in uncertainty when the model accounts for missing data through the unmeasured confounders. The naive model under performs in terms of coverage, as noted in Table 5 . The credible interval lengths appear to be relatively similar for the naive model as with external validation data. The naive model performs much worse in terms of bias, often overestimating the true effect. Yet, the external validation data supplemented with informative priors tends to remove the bias for even half of the number of original observations for all sample sizes tested. Average and median credible interval length as well as average and median bias are also displayed in Table 5 .
When performing causal inference in observational studies, accounting for the presence of unmeasured confounders has often been overlooked by researchers due to the lack of easily accessible software. Quantitative bias analysis, both deterministic and probabilistic, can quantify the magnitude and direction of the bias on the exposure effect when ignoring the presence of an unmeasured confounder by considering possible scenarios for the unmeasured confounder. Previous work focuses on deterministic QBA, likely due to its simplicity, with no R packages published for probabilistic QBA. Fox et al. [ 21 ] contribute R scripts in the supplemental material of their work to provide the first known openly accessible probabilistic QBA with unmeasured confounders. unmconf provides a package on CRAN to help resolve the disconnect between methodology for Bayesian sensitivity analysis with unmeasured confounders and its implementation through easily accessible software. A more thorough introduction to the package is provided in the package vignette, accessible via the command vignette("unmconf”, package = "unmconf”) . The package is also documented via R ’s standard documentation system and provides several examples therein.
We note the limitations of the package through its inability to model more than two unmeasured confounders. However, we combat that limitation by enabling the user to extract the JAGS model from the modeling function in order to adjust as needed. With additional (important) unmeasured confounders, a causal inference assessment may not be feasible. This package is not meant to compare to other Bayesian regression modeling packages such as brms . Rather, it merely overcomes the obstacle of modeling unmeasured confounders, which brms does not. For future work, we aim to create a similar function structure using the programming language Stan. Stan implements Hamiltonian Monte Carlo and the No-U-Turn Sampler (NUTS), which tends to converge more quickly for high-dimensional models. We hope to have provided a detailed process that can be utilized in epidemiological research to address unmeasured confounders through the discussed Bayesian unmeasured confounding model.
Availability of data and materials
This is an R package on CRAN. The link to the package is here: https://cran.r-project.org/web/packages/unmconf/index.html . The package comes with a vignette with working examples.
Cochran WG. Controlling bias in observational studies: a review. Cambridge University Press; 2006. pp. 30–58. https://doi.org/10.1017/cbo9780511810725.005 .
Rosenbaum PR, Rubin DB. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. J Am Stat Assoc. 1984;79(387):516–24. https://doi.org/10.1080/01621459.1984.10478078 .
Article Google Scholar
Steenland K. Monte Carlo Sensitivity Analysis and Bayesian Analysis of Smoking as an Unmeasured Confounder in a Study of Silica and Lung Cancer. Am J Epidemiol. 2004;160(4):384–92. https://doi.org/10.1093/aje/kwh211 .
Article PubMed Google Scholar
Arah OA. Bias Analysis for Uncontrolled Confounding in the Health Sciences. Annu Rev Public Health. 2017;38:23–38. https://doi.org/10.1146/annurev-publhealth-032315-021644 .
Fewell Z, Davey Smith G, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007;166(6):646–55. https://doi.org/10.1093/aje/kwm165 .
Groenwold RHH, Sterne JAC, Lawlor DA, Moons KGM, Hoes AW, Tilling K. Sensitivity analysis for the effects of multiple unmeasured confounders. Ann Epidemiol. 2016;26(9):605–11. https://doi.org/10.1016/j.annepidem.2016.07.009 .
Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15(5):291–303. https://doi.org/10.1002/pds.1200 .
Uddin MJ, Groenwold RHH, Ali MS, de Boer A, Roes KCB, Chowdhury MAB, et al. Methods to control for unmeasured confounding in pharmacoepidemiology: an overview. Int J Clin Pharm. 2016. https://doi.org/10.1007/s11096-016-0299-0 .
Greenland S. Bayesian perspectives for epidemiologic research: III. Bias analysis via missing-data methods. Int J Epidemiol. 2009;38(6):1662–1673. https://doi.org/10.1093/ije/dyp278 .
Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43(6):1969–85. https://doi.org/10.1093/ije/dyu149 .
McCandless LC, Gustafson P, Levy A. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat Med. 2007;26(11):2331–47. https://doi.org/10.1002/sim.2711 .
Gustafson P, McCandless LC, Levy AR, Richardson S. Simplified Bayesian Sensitivity Analysis for Mismeasured and Unobserved Confounders. Biometrics. 2010;66(4):1129–37. https://doi.org/10.1111/j.1541-0420.2009.01377.x .
Article CAS PubMed Google Scholar
Kawabata E, Tilling K, Groenwold R, Hughes R. Quantitative bias analysis in practice: Review of software for regression with unmeasured confounding. 2022. https://doi.org/10.1101/2022.02.15.22270975 .
Carnegie NB, Harada M, Hill JL. Assessing Sensitivity to Unmeasured Confounding Using a Simulated Potential Confounder. J Res Educ Eff. 2016;9(3):395–420. https://doi.org/10.1080/19345747.2015.1078862 .
Dorie V, Harada M, Carnegie NB, Hill J. A flexible, interpretable framework for assessing sensitivity to unmeasured confounding. Stat Med. 2016;35(20):3453–70. https://doi.org/10.1002/sim.6973 .
Article PubMed PubMed Central Google Scholar
Blackwell, M. A Selection Bias Approach to Sensitivity Analysis for Causal Effects. Political Analysis. Cambridge University Press; 2014:22(2):169–82. https://doi.org/10.1093/pan/mpt006 . Accessed 17 Jan 2024.
Cinelli C, Ferwerda J, Hazlett C. Sensemakr: Sensitivity Analysis Tools for OLS in R and Stata. SSRN Electron J. 2020. https://doi.org/10.2139/ssrn.3588978 .
VanderWeele TJ, Ding P. Sensitivity Analysis in Observational Research: Introducing the E-Value. Ann Intern Med. 2017;167(4):268. https://doi.org/10.7326/m16-2607 .
Xu R, Frank KA, Maroulis SJ, Rosenberg JM. konfound: Command to quantify robustness of causal inferences. Stata J Promot Commun Stat Stata. 2019;19(3):523–50. https://doi.org/10.1177/1536867x19874223 .
Fox MP, MacLehose RF, Lash TL. Best Practices for Quantitative Bias Analysis. In: Applying Quantitative Bias Analysis to Epidemiologic Data. Cham: Springer International Publishing; 2021. pp. 441–452. Series Title: Statistics for Biology and Health. https://doi.org/10.1007/978-3-030-82673-4_13 .
Fox MP, MacLehose RF, Lash TL. SAS and R code for probabilistic quantitative bias analysis for misclassified binary variables and binary unmeasured confounders. Int J Epidemiol. 2023. https://doi.org/10.1093/ije/dyad053 .
Faries D, Peng X, Pawaskar M, Price K, Stamey JD, Seaman JW. Evaluating the Impact of Unmeasured Confounding with Internal Validation Data: An Example Cost Evaluation in Type 2 Diabetes. Value Health. 2013;16(2):259–66. https://doi.org/10.1016/j.jval.2012.10.012 .
Stamey JD, Beavers DP, Faries D, Price KL, Seaman JW. Bayesian modeling of cost-effectiveness studies with unmeasured confounding: a simulation study. Pharm Stat. 2013;13(1):94–100. https://doi.org/10.1002/pst.1604 .
Lin DY, Psaty BM, Kronmal RA. Assessing the Sensitivity of Regression Results to Unmeasured Confounders in Observational Studies. Biometrics. 1998;54(3):948. https://doi.org/10.2307/2533848 .
Zhang X, Faries DE, Boytsov N, Stamey JD, Seaman JW. A Bayesian sensitivity analysis to evaluate the impact of unmeasured confounding with external data: a real world comparative effectiveness study in osteoporosis. Pharmacoepidemiol Drug Saf. 2016;25(9):982–92. https://doi.org/10.1002/pds.4053 .
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. Chapman and Hall/CRC; 2013. https://doi.org/10.1201/b16018 .
Bedrick EJ, Christensen R, Johnson W. A New Perspective on Priors for Generalized Linear Models. J Am Stat Assoc. 1996;91(436):1450–60. https://doi.org/10.1080/01621459.1996.10476713 .
Christensen R, Johnson W, Branscum A, Hanson TE. Bayesian Ideas and Data Analysis. Boca Raton: CRC Press; 2010. https://doi.org/10.1201/9781439894798 .
Mood C. Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. Eur Sociol Rev. 2010;26(1):67–82. https://doi.org/10.1093/esr/jcp006 .
Schuster NA, Twisk JWR, Ter Riet G, Heymans MW, Rijnhart JJM. Noncollapsibility and its role in quantifying confounding bias in logistic regression. BMC Med Res Methodol. 2021;21(1):136. https://doi.org/10.1186/s12874-021-01316-8 .
Pang M, Kaufman JS, Platt RW. Studying noncollapsibility of the odds ratio with marginal structural and logistic regression models. Stat Methods Med Res. 2016;25(5):1925–37. https://doi.org/10.1177/0962280213505804 .
Janes H, Dominici F, Zeger S. On quantifying the magnitude of confounding. Biostatistics. 2010;11(3):572–82. https://doi.org/10.1093/biostatistics/kxq007 .
Download references
Acknowledgements
Not applicable.
Stamey, Kahle, and Hebdon funded by research contract from CSL Behring.
Author information
Authors and affiliations.
Department of Statistical Science, Baylor University, Waco, TX, USA
Ryan Hebdon, James Stamey & David Kahle
CSL Behring, CSL Limited, King of Prussia, PA, USA
Xiang Zhang
You can also search for this author in PubMed Google Scholar
Contributions
R.H. wrote the main manuscript text. All authors reviewed and contributed to the manuscript. J.S. and X.Z. conceived the idea of this project. D.K. and R.H. wrote the foundation of the software package. All authors contributed to the package development. All authors have approved the submitted version of the manuscript.
Corresponding author
Correspondence to Ryan Hebdon .
Ethics declarations
Ethics approval and consent to participate, consent for publication, competing interests.
The authors declare no competing interests.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .
Reprints and permissions
About this article
Cite this article.
Hebdon, R., Stamey, J., Kahle, D. et al. unmconf : an R package for Bayesian regression with unmeasured confounders. BMC Med Res Methodol 24 , 195 (2024). https://doi.org/10.1186/s12874-024-02322-2
Download citation
Received : 19 February 2024
Accepted : 27 August 2024
Published : 07 September 2024
DOI : https://doi.org/10.1186/s12874-024-02322-2
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Bayesian methods
- Unmeasured confounding
- Sensitivity analysis
- Epidemiology
BMC Medical Research Methodology
ISSN: 1471-2288
- General enquiries: [email protected]
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, automatically generate references for free.
- Knowledge Base
- Research bias
Types of Bias in Research | Definition & Examples
Research bias results from any deviation from the truth, causing distorted results and wrong conclusions. Bias can occur at any phase of your research, including during data collection , data analysis , interpretation, or publication. Research bias can occur in both qualitative and quantitative research .
Understanding research bias is important for several reasons.
- Bias exists in all research, across research designs , and is difficult to eliminate.
- Bias can occur at any stage of the research process.
- Bias impacts the validity and reliability of your findings, leading to misinterpretation of data.
It is almost impossible to conduct a study without some degree of research bias. It’s crucial for you to be aware of the potential types of bias, so you can minimise them.
For example, the success rate of the program will likely be affected if participants start to drop out. Participants who become disillusioned due to not losing weight may drop out, while those who succeed in losing weight are more likely to continue. This in turn may bias the findings towards more favorable results.
Table of contents
Actor–observer bias.
- Confirmation bias
Information bias
Interviewer bias.
- Publication bias
Researcher bias
Response bias.
Selection bias
How to avoid bias in research
Other types of research bias, frequently asked questions about research bias.
Actor–observer bias occurs when you attribute the behaviour of others to internal factors, like skill or personality, but attribute your own behaviour to external or situational factors.
In other words, when you are the actor in a situation, you are more likely to link events to external factors, such as your surroundings or environment. However, when you are observing the behaviour of others, you are more likely to associate behaviour with their personality, nature, or temperament.
One interviewee recalls a morning when it was raining heavily. They were rushing to drop off their kids at school in order to get to work on time. As they were driving down the road, another car cut them off as they were trying to merge. They tell you how frustrated they felt and exclaim that the other driver must have been a very rude person.
At another point, the same interviewee recalls that they did something similar: accidentally cutting off another driver while trying to take the correct exit. However, this time, the interviewee claimed that they always drive very carefully, blaming their mistake on poor visibility due to the rain.
Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making.
Let’s say you grew up with a parent in the military. Chances are that you have a lot of complex emotions around overseas deployments. This can lead you to over-emphasise findings that ‘prove’ that your lived experience is the case for most families, neglecting other explanations and experiences.
Information bias , also called measurement bias, arises when key study variables are inaccurately measured or classified. Information bias occurs during the data collection step and is common in research studies that involve self-reporting and retrospective data collection. It can also result from poor interviewing techniques or differing levels of recall from participants.
The main types of information bias are:
- Recall bias
- Observer bias
Performance bias
Regression to the mean (rtm).
Over a period of four weeks, you ask students to keep a journal, noting how much time they spent on their smartphones along with any symptoms like muscle twitches, aches, or fatigue.
Recall bias is a type of information bias. It occurs when respondents are asked to recall events in the past and is common in studies that involve self-reporting.
As a rule of thumb, infrequent events (e.g., buying a house or a car) will be memorable for longer periods of time than routine events (e.g., daily use of public transportation). You can reduce recall bias by running a pilot survey and carefully testing recall periods. If possible, test both shorter and longer periods, checking for differences in recall.
- A group of children who have been diagnosed, called the case group
- A group of children who have not been diagnosed, called the control group
Since the parents are being asked to recall what their children generally ate over a period of several years, there is high potential for recall bias in the case group.
The best way to reduce recall bias is by ensuring your control group will have similar levels of recall bias to your case group. Parents of children who have childhood cancer, which is a serious health problem, are likely to be quite concerned about what may have contributed to the cancer.
Thus, if asked by researchers, these parents are likely to think very hard about what their child ate or did not eat in their first years of life. Parents of children with other serious health problems (aside from cancer) are also likely to be quite concerned about any diet-related question that researchers ask about.
Observer bias is the tendency of research participants to see what they expect or want to see, rather than what is actually occurring. Observer bias can affect the results in observationa l and experimental studies, where subjective judgement (such as assessing a medical image) or measurement (such as rounding blood pressure readings up or down) is part of the data collection process.
Observer bias leads to over- or underestimation of true values, which in turn compromise the validity of your findings. You can reduce observer bias by using double- and single-blinded research methods.
Based on discussions you had with other researchers before starting your observations, you are inclined to think that medical staff tend to simply call each other when they need specific patient details or have questions about treatments.
At the end of the observation period, you compare notes with your colleague. Your conclusion was that medical staff tend to favor phone calls when seeking information, while your colleague noted down that medical staff mostly rely on face-to-face discussions. Seeing that your expectations may have influenced your observations, you and your colleague decide to conduct interviews with medical staff to clarify the observed events. Note: Observer bias and actor–observer bias are not the same thing.
Performance bias is unequal care between study groups. Performance bias occurs mainly in medical research experiments, if participants have knowledge of the planned intervention, therapy, or drug trial before it begins.
Studies about nutrition, exercise outcomes, or surgical interventions are very susceptible to this type of bias. It can be minimized by using blinding , which prevents participants and/or researchers from knowing who is in the control or treatment groups. If blinding is not possible, then using objective outcomes (such as hospital admission data) is the best approach.
When the subjects of an experimental study change or improve their behaviour because they are aware they are being studied, this is called the Hawthorne (or observer) effect . Similarly, the John Henry effect occurs when members of a control group are aware they are being compared to the experimental group. This causes them to alter their behaviour in an effort to compensate for their perceived disadvantage.
Regression to the mean (RTM) is a statistical phenomenon that refers to the fact that a variable that shows an extreme value on its first measurement will tend to be closer to the centre of its distribution on a second measurement.
Medical research is particularly sensitive to RTM. Here, interventions aimed at a group or a characteristic that is very different from the average (e.g., people with high blood pressure) will appear to be successful because of the regression to the mean. This can lead researchers to misinterpret results, describing a specific intervention as causal when the change in the extreme groups would have happened anyway.
In general, among people with depression, certain physical and mental characteristics have been observed to deviate from the population mean .
This could lead you to think that the intervention was effective when those treated showed improvement on measured post-treatment indicators, such as reduced severity of depressive episodes.
However, given that such characteristics deviate more from the population mean in people with depression than in people without depression, this improvement could be attributed to RTM.
Interviewer bias stems from the person conducting the research study. It can result from the way they ask questions or react to responses, but also from any aspect of their identity, such as their sex, ethnicity, social class, or perceived attractiveness.
Interviewer bias distorts responses, especially when the characteristics relate in some way to the research topic. Interviewer bias can also affect the interviewer’s ability to establish rapport with the interviewees, causing them to feel less comfortable giving their honest opinions about sensitive or personal topics.
Participant: ‘I like to solve puzzles, or sometimes do some gardening.’
You: ‘I love gardening, too!’
In this case, seeing your enthusiastic reaction could lead the participant to talk more about gardening.
Establishing trust between you and your interviewees is crucial in order to ensure that they feel comfortable opening up and revealing their true thoughts and feelings. At the same time, being overly empathetic can influence the responses of your interviewees, as seen above.
Publication bias occurs when the decision to publish research findings is based on their nature or the direction of their results. Studies reporting results that are perceived as positive, statistically significant , or favoring the study hypotheses are more likely to be published due to publication bias.
Publication bias is related to data dredging (also called p -hacking ), where statistical tests on a set of data are run until something statistically significant happens. As academic journals tend to prefer publishing statistically significant results, this can pressure researchers to only submit statistically significant results. P -hacking can also involve excluding participants or stopping data collection once a p value of 0.05 is reached. However, this leads to false positive results and an overrepresentation of positive results in published academic literature.
Researcher bias occurs when the researcher’s beliefs or expectations influence the research design or data collection process. Researcher bias can be deliberate (such as claiming that an intervention worked even if it didn’t) or unconscious (such as letting personal feelings, stereotypes, or assumptions influence research questions ).
The unconscious form of researcher bias is associated with the Pygmalion (or Rosenthal) effect, where the researcher’s high expectations (e.g., that patients assigned to a treatment group will succeed) lead to better performance and better outcomes.
Researcher bias is also sometimes called experimenter bias, but it applies to all types of investigative projects, rather than only to experimental designs .
- Good question: What are your views on alcohol consumption among your peers?
- Bad question: Do you think it’s okay for young people to drink so much?
Response bias is a general term used to describe a number of different situations where respondents tend to provide inaccurate or false answers to self-report questions, such as those asked on surveys or in structured interviews .
This happens because when people are asked a question (e.g., during an interview ), they integrate multiple sources of information to generate their responses. Because of that, any aspect of a research study may potentially bias a respondent. Examples include the phrasing of questions in surveys, how participants perceive the researcher, or the desire of the participant to please the researcher and to provide socially desirable responses.
Response bias also occurs in experimental medical research. When outcomes are based on patients’ reports, a placebo effect can occur. Here, patients report an improvement despite having received a placebo, not an active medical treatment.
While interviewing a student, you ask them:
‘Do you think it’s okay to cheat on an exam?’
Common types of response bias are:
Acquiescence bias
Demand characteristics.
- Social desirability bias
Courtesy bias
- Question-order bias
Extreme responding
Acquiescence bias is the tendency of respondents to agree with a statement when faced with binary response options like ‘agree/disagree’, ‘yes/no’, or ‘true/false’. Acquiescence is sometimes referred to as ‘yea-saying’.
This type of bias occurs either due to the participant’s personality (i.e., some people are more likely to agree with statements than disagree, regardless of their content) or because participants perceive the researcher as an expert and are more inclined to agree with the statements presented to them.
Q: Are you a social person?
People who are inclined to agree with statements presented to them are at risk of selecting the first option, even if it isn’t fully supported by their lived experiences.
In order to control for acquiescence, consider tweaking your phrasing to encourage respondents to make a choice truly based on their preferences. Here’s an example:
Q: What would you prefer?
- A quiet night in
- A night out with friends
Demand characteristics are cues that could reveal the research agenda to participants, risking a change in their behaviours or views. Ensuring that participants are not aware of the research goals is the best way to avoid this type of bias.
On each occasion, patients reported their pain as being less than prior to the operation. While at face value this seems to suggest that the operation does indeed lead to less pain, there is a demand characteristic at play. During the interviews, the researcher would unconsciously frown whenever patients reported more post-op pain. This increased the risk of patients figuring out that the researcher was hoping that the operation would have an advantageous effect.
Social desirability bias is the tendency of participants to give responses that they believe will be viewed favorably by the researcher or other participants. It often affects studies that focus on sensitive topics, such as alcohol consumption or sexual behaviour.
You are conducting face-to-face semi-structured interviews with a number of employees from different departments. When asked whether they would be interested in a smoking cessation program, there was widespread enthusiasm for the idea.
Note that while social desirability and demand characteristics may sound similar, there is a key difference between them. Social desirability is about conforming to social norms, while demand characteristics revolve around the purpose of the research.
Courtesy bias stems from a reluctance to give negative feedback, so as to be polite to the person asking the question. Small-group interviewing where participants relate in some way to each other (e.g., a student, a teacher, and a dean) is especially prone to this type of bias.
Question order bias
Question order bias occurs when the order in which interview questions are asked influences the way the respondent interprets and evaluates them. This occurs especially when previous questions provide context for subsequent questions.
When answering subsequent questions, respondents may orient their answers to previous questions (called a halo effect ), which can lead to systematic distortion of the responses.
Extreme responding is the tendency of a respondent to answer in the extreme, choosing the lowest or highest response available, even if that is not their true opinion. Extreme responding is common in surveys using Likert scales , and it distorts people’s true attitudes and opinions.
Disposition towards the survey can be a source of extreme responding, as well as cultural components. For example, people coming from collectivist cultures tend to exhibit extreme responses in terms of agreement, while respondents indifferent to the questions asked may exhibit extreme responses in terms of disagreement.
Selection bias is a general term describing situations where bias is introduced into the research from factors affecting the study population.
Common types of selection bias are:
Sampling or ascertainment bias
- Attrition bias
Volunteer or self-selection bias
- Survivorship bias
- Nonresponse bias
- Undercoverage bias
Sampling bias occurs when your sample (the individuals, groups, or data you obtain for your research) is selected in a way that is not representative of the population you are analyzing. Sampling bias threatens the external validity of your findings and influences the generalizability of your results.
The easiest way to prevent sampling bias is to use a probability sampling method . This way, each member of the population you are studying has an equal chance of being included in your sample.
Sampling bias is often referred to as ascertainment bias in the medical field.
Attrition bias occurs when participants who drop out of a study systematically differ from those who remain in the study. Attrition bias is especially problematic in randomized controlled trials for medical research because participants who do not like the experience or have unwanted side effects can drop out and affect your results.
You can minimize attrition bias by offering incentives for participants to complete the study (e.g., a gift card if they successfully attend every session). It’s also a good practice to recruit more participants than you need, or minimize the number of follow-up sessions or questions.
You provide a treatment group with weekly one-hour sessions over a two-month period, while a control group attends sessions on an unrelated topic. You complete five waves of data collection to compare outcomes: a pretest survey , three surveys during the program, and a posttest survey.
Volunteer bias (also called self-selection bias ) occurs when individuals who volunteer for a study have particular characteristics that matter for the purposes of the study.
Volunteer bias leads to biased data, as the respondents who choose to participate will not represent your entire target population. You can avoid this type of bias by using random assignment – i.e., placing participants in a control group or a treatment group after they have volunteered to participate in the study.
Closely related to volunteer bias is nonresponse bias , which occurs when a research subject declines to participate in a particular study or drops out before the study’s completion.
Considering that the hospital is located in an affluent part of the city, volunteers are more likely to have a higher socioeconomic standing, higher education, and better nutrition than the general population.
Survivorship bias occurs when you do not evaluate your data set in its entirety: for example, by only analyzing the patients who survived a clinical trial.
This strongly increases the likelihood that you draw (incorrect) conclusions based upon those who have passed some sort of selection process – focusing on ‘survivors’ and forgetting those who went through a similar process and did not survive.
Note that ‘survival’ does not always mean that participants died! Rather, it signifies that participants did not successfully complete the intervention.
However, most college dropouts do not become billionaires. In fact, there are many more aspiring entrepreneurs who dropped out of college to start companies and failed than succeeded.
Nonresponse bias occurs when those who do not respond to a survey or research project are different from those who do in ways that are critical to the goals of the research. This is very common in survey research, when participants are unable or unwilling to participate due to factors like lack of the necessary skills, lack of time, or guilt or shame related to the topic.
You can mitigate nonresponse bias by offering the survey in different formats (e.g., an online survey, but also a paper version sent via post), ensuring confidentiality , and sending them reminders to complete the survey.
You notice that your surveys were conducted during business hours, when the working-age residents were less likely to be home.
Undercoverage bias occurs when you only sample from a subset of the population you are interested in. Online surveys can be particularly susceptible to undercoverage bias. Despite being more cost-effective than other methods, they can introduce undercoverage bias as a result of excluding people who do not use the internet.
While very difficult to eliminate entirely, research bias can be mitigated through proper study design and implementation. Here are some tips to keep in mind as you get started.
- Clearly explain in your methodology section how your research design will help you meet the research objectives and why this is the most appropriate research design.
- In quantitative studies , make sure that you use probability sampling to select the participants. If you’re running an experiment, make sure you use random assignment to assign your control and treatment groups.
- Account for participants who withdraw or are lost to follow-up during the study. If they are withdrawing for a particular reason, it could bias your results. This applies especially to longer-term or longitudinal studies .
- Use triangulation to enhance the validity and credibility of your findings.
- Phrase your survey or interview questions in a neutral, non-judgemental tone. Be very careful that your questions do not steer your participants in any particular direction.
- Consider using a reflexive journal. Here, you can log the details of each interview , paying special attention to any influence you may have had on participants. You can include these in your final analysis.
Cognitive bias
- Baader–Meinhof phenomenon
- Availability heuristic
- Halo effect
- Framing effect
- Sampling bias
- Ascertainment bias
- Self-selection bias
- Hawthorne effect
- Omitted variable bias
- Pygmalion effect
- Placebo effect
Bias in research affects the validity and reliability of your findings, leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.
Observer bias occurs when the researcher’s assumptions, views, or preconceptions influence what they see and record in a study, while actor–observer bias refers to situations where respondents attribute internal factors (e.g., bad character) to justify other’s behaviour and external factors (difficult circumstances) to justify the same behaviour in themselves.
Response bias is a general term used to describe a number of different conditions or factors that cue respondents to provide inaccurate or false answers during surveys or interviews . These factors range from the interviewer’s perceived social position or appearance to the the phrasing of questions in surveys.
Nonresponse bias occurs when the people who complete a survey are different from those who did not, in ways that are relevant to the research topic. Nonresponse can happen either because people are not willing or not able to participate.
Is this article helpful?
More interesting articles.
- Attrition Bias | Examples, Explanation, Prevention
- Demand Characteristics | Definition, Examples & Control
- Hostile Attribution Bias | Definition & Examples
- Observer Bias | Definition, Examples, Prevention
- Regression to the Mean | Definition & Examples
- Representativeness Heuristic | Example & Definition
- Sampling Bias and How to Avoid It | Types & Examples
- Self-Fulfilling Prophecy | Definition & Examples
- The Availability Heuristic | Example & Definition
- The Baader–Meinhof Phenomenon Explained
- What Is a Ceiling Effect? | Definition & Examples
- What Is Actor-Observer Bias? | Definition & Examples
- What Is Affinity Bias? | Definition & Examples
- What Is Anchoring Bias? | Definition & Examples
- What Is Ascertainment Bias? | Definition & Examples
- What Is Belief Bias? | Definition & Examples
- What Is Bias for Action? | Definition & Examples
- What Is Cognitive Bias? | Meaning, Types & Examples
- What Is Confirmation Bias? | Definition & Examples
- What Is Conformity Bias? | Definition & Examples
- What Is Correspondence Bias? | Definition & Example
- What Is Explicit Bias? | Definition & Examples
- What Is Generalisability? | Definition & Examples
- What Is Hindsight Bias? | Definition & Examples
- What Is Implicit Bias? | Definition & Examples
- What Is Information Bias? | Definition & Examples
- What Is Ingroup Bias? | Definition & Examples
- What Is Negativity Bias? | Definition & Examples
- What Is Nonresponse Bias?| Definition & Example
- What Is Normalcy Bias? | Definition & Example
- What Is Omitted Variable Bias? | Definition & Example
- What Is Optimism Bias? | Definition & Examples
- What Is Outgroup Bias? | Definition & Examples
- What Is Overconfidence Bias? | Definition & Examples
- What Is Perception Bias? | Definition & Examples
- What Is Primacy Bias? | Definition & Example
- What Is Publication Bias? | Definition & Examples
- What Is Recall Bias? | Definition & Examples
- What Is Recency Bias? | Definition & Examples
- What Is Response Bias? | Definition & Examples
- What Is Selection Bias? | Definition & Examples
- What Is Self-Selection Bias? | Definition & Example
- What Is Self-Serving Bias? | Definition & Example
- What Is Social Desirability Bias? | Definition & Examples
- What Is Status Quo Bias? | Definition & Examples
- What Is Survivorship Bias? | Definition & Examples
- What Is the Affect Heuristic? | Example & Definition
- What Is the Egocentric Bias? | Definition & Examples
- What Is the Framing Effect? | Definition & Examples
- What Is the Halo Effect? | Definition & Examples
- What Is the Hawthorne Effect? | Definition & Examples
- What Is the Placebo Effect? | Definition & Examples
- What Is the Pygmalion Effect? | Definition & Examples
- What Is Unconscious Bias? | Definition & Examples
- What Is Undercoverage Bias? | Definition & Example
- What Is Vividness Bias? | Definition & Examples
Differences among the total electron content derived by radio occultation, global ionospheric maps and satellite altimetry
- Original Article
- Open access
- Published: 11 September 2024
- Volume 98 , article number 82 , ( 2024 )
Cite this article
You have full access to this open access article
- M. J. Wu ORCID: orcid.org/0000-0003-2568-6830 1 ,
- P. Guo 1 , 2 ,
- X. Ma 1 , 3 ,
- J. C. Xue 1 ,
- M. Liu 4 &
- X. G. Hu 1
In recent years, significant progress has been in ionospheric modeling research through data ingestion and data assimilation from a variety of sources, including ground-based global navigation satellite systems, space-based radio occultation and satellite altimetry (SA). Given the diverse observing geometries, vertical data coverages and intermission biases among different measurements, it is imperative to evaluate their absolute accuracies and estimate systematic biases to determine reasonable weights and error covariances when constructing ionospheric models. This study specifically investigates the disparities among the vertical total electron content (VTEC) derived from SA data of the Jason and Sentinel missions, the integrated VTEC from the Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC) and global ionospheric maps (GIMs). To mitigate the systematic bias resulting from differences in satellite altitudes, the vertical ranges of various VTECs are mapped to a standardized height. The results indicate that the intermission bias of SA-derived VTEC remains relatively stable, with Jason-1 serving as a benchmark for mapping other datasets. The mean bias between COSMIC and SA-derived VTEC is minimal, suggesting good agreement between these two space-based techniques. However, COSMIC and GIM VTEC exhibit remarkable seasonal discrepancies, influenced by the solar activity variations. Moreover, GIMs demonstrate noticeable hemispheric asymmetry and a degradation in accuracy ranging from 0.7 to 1.7 TECU in the ocean-dominant Southern Hemisphere. While space-based observations effectively illustrate phenomena such as the Weddell Sea anomaly and longitudinal ionospheric characteristics, GIMs tend to exhibit a more pronounced mid-latitude electron density enhancement structure.
Avoid common mistakes on your manuscript.
1 Introduction
Ionospheric models are essential for correcting pseudo range errors in single-frequency global navigation satellite systems (GNSS) applications. Among these, the vertical total electron content (VTEC) stands out as an important ionospheric parameter for space weather investigations and real-time, high-precision positioning, such as precise point positioning real-time kinematic (PPP-RTK) services (Klobuchar 1987 ; Hirokawa and Fernández-Hernández 2020 ; Li et al. 2020 ). There are several analytical centers providing routinely updated VTEC maps, notably the global ionospheric map (GIM), based on data from global GNSS ground stations (Hernández-Pajares et al. 2009 ). However, the uneven distribution of ground stations, particularly sparse over oceans and deserts, results in degraded model accuracy in these regions. Besides the ground-based observations, ionospheric parameters can also be retrieved from spaceborne measurements, such as intersatellite links. GNSS/low Earth orbit (LEO) radio occultation (RO) is a well-established technique providing ionospheric soundings with low cost, global coverage and high precision. While RO offers vertical profiles of electron density, satellite altimetry (SA) observes marine areas, furnishing VTEC measurements along the nadir track via dual-frequency signals. The integration of multi-source data through data assimilation or combination methods provides a compelling opportunity to enhance operational ionospheric models (Alizadeh et al. 2011 ; Chen et al. 2017 ). Nevertheless, these data are obtained through diverse techniques and processing methods, resulting in discrepancies in resolution, coverage, accuracy and time latency among direct ionospheric products. Understanding the accuracy and bias inherent in different observation techniques is crucial for determining the weights in data combinations. Similarly, constructing an error covariance matrix in data assimilation also requires reliable accuracy information to appropriately balance the contributions from background models and realistic measurements (Bust et al. 2004 ; Aa and Zhang 2022 ).
In earlier literatures, observations from SA and RO have often been utilized as independent validation data to assess global ionospheric TEC maps and empirical models, with less emphasis on their inherent differences (Brunini et al. 2005 ; Cherniak and Zakharenkova 2019 ; Li et al. 2019 ; Wielgosz et al. 2021 ). Some researches addressed the systematic TEC discrepancies among GNSS, spaceborne and other space-geodetic techniques: Li et al. ( 2019 ) analyzed the bias and scaling factors between International GNSS Service (IGS) GIM and spaceborne TEC, noting variations with season, local time and location; Alizadeh et al. ( 2011 ) developed GIM from GNSS, satellite altimetry and radio occultation data, adopting empirical weighting schemes and a priori variances for various VTEC observations without detailed discussion on technique differences; Dettmering et al. ( 2011b ) applied variance component estimation method to account for accuracy differences among terrestrial and satellite-based GNSS, DORIS, altimetry and VLBI, focusing on a specific region around the Hawaiian Islands during a 2-week interval in 2008 and Chen et al. ( 2017 ) integrated ground- and space-based data similarly and estimated the instrumental bias and plasmaspheric component of different ionospheric data as 2-h or daily constant parameters in May 2013.
The systematic differences among TEC datasets are mainly caused by modeling errors, unknown hardware offsets and variations in observation geometries. For GIMs, absolute VTEC may be affected by mapping function errors and estimation error of differential code bias (DCB). RO observations employ dual-frequency combinations of excess phase to obtain relative TEC observations, with electron density retrieved via Abel inversion. While the integral VTEC is free of DCB estimation error, it suffers from retrieval errors due to the spherical symmetric assumption. On the other hand, the dual-frequency altimeter onboard the SA platform directly measures the ionosphere in the nadir direction, allowing VTEC extraction without mapping function application. However, systematic bias from consecutive missions and different data versions should be handled with cares when using SA TEC observations as references. Additionally, the vertical coverage of these datasets introduces inherent bias among different TEC results. Ground-based GNSS VTEC measures the total ionospheric contribution from ground stations to GNSS satellite height, while RO- and SA-derived TECs cover only the vertical range below LEO orbit altitude. The contribution of the plasmasphere is non-negligible under certain circumstances, such as during nighttime in low solar activity (LSA) years (Jin et al. 2021 ). It is a compromise to directly compare and assess the accuracy of observations with such large vertical gaps.
Therefore, comprehensive research on the differences and validation of TEC observations is still needed on a global scale and over a longer duration, while also accounting for the influence of vertical range differences. This study aims to address these shortcomings and appropriately fill the vertical data gaps for the first time. The latest processed ionospheric data, JASON-1 ‘E’ (J1E), are regarded as the most reliable reference level among altimeter satellites, with a discrepancy of just 0.1 TECU compared to DORIS observations (Azpilicueta and Nava 2021 ). When conducting comparisons, we mapped the results of all the considered satellites to the J1E TEC reference frame. To mitigate the influence of orbit differences, we extrapolated the RO VTEC to the same altitude as Jason observations using an exponential profiler. Additionally, the vertical difference between RO and GIM is compensated by adding the plasmaspheric electron content above the LEO satellites. This approach enables the evaluation of TEC from different observing geometries under the same conditions and is considered to provide systematic bias-free results.
The main objectives of this research can be summarized as follows:
Investigating the intermission bias of altimetry TEC observations, including the Jason and Sentinel series, and determining the reference standard and accuracy level for global TEC comparison (Sect. 3 );
Comparing the TEC observations from satellite altimetry and Constellation Observing System for Meteorology Ionosphere and Climate (COSMIC) radio occultation with the same vertical coverage under different solar activity conditions in oceanic regions (Sect. 4);
Assessing the GIM VTEC in various latitudinal bands and ocean/land locations based on local time, seasons and solar activity levels, while considering compensated COSMIC observations that account for the plasmaspheric contribution (Sect. 5 );
Analyzing specific ionospheric characteristics revealed by different observation techniques and addressing the advantages and disadvantages of each method (Sect. 6 ).
The assessment of TEC measurements in this study may provide insights for specifying weights and error variances in data combination and data assimilation processes.
2 Data sources and methods
The data considered in this study comprise VTEC observations from Jason-1/2/3 and Sentinel-3 satellites, COSMIC-retrieved TEC and ground-based GIM. Figure 1 provides an overview of the data coverage and maximum observing altitudes of these TEC datasets. The data continuity of GNSS ground stations is the most consistent, while Jason and COSMIC data have been available only since 2006. The satellite altitudes vary across missions, with COSMIC-1 orbiting at approximately 800 km and COSMIC-2 at 500 km after constellation deployment. The Sentinel-3A, available since 2016, orbits at the similar altitude of COSMIC-1, while the Jason series remains at approximately 1330 km altitude. To encompass the solar cycle revolution and periods of overlap among different techniques, we selected the years 2008 and 2014 to represent low and high solar activity conditions. The Sentinel series of satellites provide a vital contribution to Earth monitoring these years. To ensure a sufficient volume of co-located observations with RO, we chose the year 2017 for comparisons among COSMIC, Jason and Sentinel satellites. Methods for leveling the vertical coverage of different TEC datasets to the same altitude are introduced in detail below.
The satellite altitude and temporal coverage of COSMIC, Jason, Sentinel series and GNSS observations over recent decades
2.1 GIM VTEC
GIM provides global vertical TEC from the ground to GNSS satellites and finds application in various fields such as real-time ionospheric error mitigation, space weather and climate monitoring. It is also regarded as a validation reference for other data types or empirical models (Hernández-Pajares et al. 2009 ). GIMs have been systematically produced and provided by the international GNSS service ionosphere working group (IIWG) since June 1, 1988. Different ionospheric-associated analysis centers (IAACs) develop their own GIM products using various methodologies and standards. The final IGS product is a weighted combination of individual GIMs provided by these IAACs, with specific spatial and temporal resolutions (typically \(2.5^\circ lat \times 5.0^\circ lon\) , 15 min to 2 h). The assessment of the accuracy and consistency of different techniques, estimations and GIM products has been crucial in recent years, considering various solar conditions, geographical locations, interpolation methods and mapping functions (Hernández-Pajares et al. 2017 ; Roma-Dollase et al. 2018 ; Li et al. 2019 ; Wielgosz et al. 2021 ). In this study, we utilize the IGS GIM with a 2-h temporal resolution. Figure 2 illustrates the IGS ground stations used to construct the ionospheric maps and the distribution of GIM VTEC. The stations are relatively denser in the Northern Hemisphere and on land in Europe and America, while sparser over oceans and in polar regions. Consequently, the accuracy of GIM in the ocean-dominated Southern Hemisphere is anticipated to degrade to some extent.
The distributions of ground IGS stations (gray triangles) and the GIM VTEC. The selected time is UT 0:00 on DOY 137 of 2014
2.2 COSMIC VTEC
COSMIC/FORMOSAT-3 is a US‐Taiwan joint radio occultation mission consisting of six satellites. Launched in April 2006, COSMIC has provided numerous neutral atmospheric and ionospheric sounding data, making a great contribution to operational numerical weather prediction and ionospheric research (Anthes et al. 2008 ). The spaceborne RO receiver receives dual-frequency signals, enabling the calculating of relative TEC along the signal trajectory from the LEO receiver to the GNSS transmitter by combining phase measurements at two frequencies of each system ( \(L_{1}\) and \(L_{2}\) ):
where \(f_{i}\) represents the signal frequency; and \(L_{i}\) is the phase measurement at each frequency. To obtain calibrated TEC, the contribution of electron above the LEO orbit height should be excluded. This calibrated TEC represents the integral of electron density along the line of sight. Usually, the Abel transformation is applied to retrieve electron density at each tangent point height from the TEC observations under assumption of ionospheric spherical symmetry. The COSMIC Data Analysis and Archive Center (CDAAC) is responsible for data processing of COSMIC and several other occultation missions. A follow-on satellite mission, COSMIC-2 (FORMOSAT-7), was successfully launched on June 25, 2019 (Schreiner et al. 2020 ). Pedatella et al. ( 2021 ) processed and validated COSMIC-2 absolute TEC observations by comparing them with collocated Swarm-B TECs, concluding that TEC accuracy was better than 3.0 TECU. Satellite data and products of various levels are provided in an open-access environment hosted by CDAAC. In this study, we focus on VTEC, which is obtained by the integral of the vertical electron density profile (EDP) along the vertical path. However, EDP suffers from the retrieval errors due to the ionospheric spherical symmetry assumption in the Abel inversion (Yue et al. 2010 ). Thus, instead of directly using the level 2 products from CDAAC, we collected COSMIC level 1b observations and performed an IRI-aided (IRI for International Reference Ionosphere) Abel inversion to estimate more precise EPDs. Under high solar activity (HSA) conditions, the new Abel inversion demonstrates a significant improvement of more than 1 TECU over the classic retrieval approach (Wu et al. 2019 ). The main processing procedures are summarized as follows:
Step 1, Classic Abel inversion: The COSMIC RO excess phase observations (labeled as ‘ionPhs’ file in level1b) are processed and retrieved by standard Abel inversion to obtain electron density at each tangent point;
Step 2, Parameter modeling: Three F2-layer parameters (the critical frequency (foF2), the peak density height (hmF2) and the scale height (Hsc)), and two topside parameters (the TEC above the satellites and the topside electron density), are extracted from the EDPs and modeled by spherical harmonics expansion and empirical orthogonal functions;
Step 3, IRI improvement: The modeled foF2 and hmF2 models are integrated into IRI as new choices for the F2 peak parameters; the scale height and adaptive topside parameters are incorporated into the IRI topside function; subsequently, an enhanced IRI model, constrained by external data from RO, is constructed (Wu et al. 2018 );
Step 4, Aided Abel inversion: The enhanced IRI model serves as a background field, providing additional horizontal gradient information. The classic Abel inversion is improved by adopting a spherical symmetric TEC constraint (Guo et al. 2015 ; Wu et al. 2019 );
Step 5, Iteration: All steps can be run iteratively to gradually reduce ionospheric retrieval errors.
Since the vertical range associated with the TEC obtained from ground-based GNSS receiver, COSMIC RO and satellite altimetry varies, direct comparison of TEC from these three sources is not meaningful. To compensate for the contribution of the topside ionosphere and plasmasphere above RO satellites, precise orbit determination (POD) observations of COSMIC (labeled as ‘podTec’ in CDAAC) are utilized to fill the TEC gap above the RO satellites up to the GNSS satellites. The distribution of RO and POD observations and their corresponding TEC are shown in Fig. 3 . RO measurements exhibit good global coverage over both land and oceans. Panel (b) illustrates the spatial characteristics of plasmaspheric TEC at midnight, with their contribution potentially exceeding 10 TECU at mid- and low latitudes. It should be noted that the RO TEC is assumed to be located at the position (longitude and latitude) of peak electron density, while POD yield TEC is observed at the position of LEO satellite. To connect the two TECs, POD STEC with elevation angle greater than 50° is converted to VTEC using the F&K mapping function (Foelsche and Kirchengast 2002 ), and a global gridded model is constructed based on latitude, local time, month and year in step 2.
where \(R_{shell}\) is the radius of the effective height, applying 2100 km here (Zhong et al. 2016 ); \(R_{orbit}\) is the radius of LEO satellite and \(z\) is the zenith angle of the line-of-sight ray. When comparing RO to ground-based GIM VTEC, the actual COSMIC VTEC comprises the following two components:
in which \(TEC_{0}\) represents the integral TEC of the RO electron density profile ( \(Ne\left( h \right)\) ), and \(TEC_{1}\) is the specific \(VTEC_{POD}\) matched to the position of \(TEC_{0}\) by interpolation of POD VTEC model, denoted as \({\text{VTEC}}_{{{\text{POD}}}}{\prime}\) .
The distribution of a COSMIC RO and b POD VTEC in January 2014, LT 0:00 ~ 2:00
2.3 Satellite altimetry VTEC
Jason is a collaborative oceanography serial mission of the Centre National d’Etudes Spatiales (CNES) and the National Aeronautics and Space Administration (NASA), with the aim of monitoring global ocean circulation, understanding the tie between oceans and the atmosphere, improving global climate prediction and investigating phenomena such as El Niño conditions and ocean eddies (Lafon and Parisot 1998 ). Jason-1 was successfully launched in December 2001 and provided precise measurements of sea-surface elevation until June 2013. Its follow-on missions, Jason-2 and Jason-3, were launched in June 2008 and January 2016, respectively, and have continued to operate effectively since their launches. Sentinel-3 is a dedicated satellite mission under the Copernicus program, providing high-quality ocean and atmosphere measurements such as sea-surface topography, sea-surface temperature and ocean surface color. Sentinel-3A was launched on February 16, 2016, equipped with an altimeter instrument called Synthetic Aperture Radar Altimeter (SRAL) onboard. In addition to its primary oceanography objectives, spaceborne dual-frequency altimeters aboard transmit signals vertically toward the sea surface, penetrating the ionosphere below the satellite’s orbit height. Therefore, the SA VTEC can be calculated according to the following formula:
where \(dR\) represents the ionospheric range correction of the Ku or C band, and \(f\) is the band frequency. For Jason, the \(dR\) is provided in the geophysical data records and is available for registered users at the CNES website https://aviso-data-center.cnes.fr . The ionospheric correction products of Sentinel-3 in the marine areas are operationally managed by the EUMETSAT Sentinel-3 Marine Centre ( https://www.eumetsat.int/eumetsat-data-centre ). The original ionospheric correction may be affected by instrument noise effects; therefore, we applied a filter of 20–25 samples along the orbit track, as recommended by the product handbook and Imel ( 1994 ). The previous studies have combined satellite altimetry TEC observations with other data types to construct global or regional ionospheric models (Alizadeh et al. 2011 ; Dettmering et al. 2011a ; Yao et al. 2018 ), or treated SA data as a means of accurate validation for GNSS and RO TECs, albeit with systematic bias (Li et al. 2019 ; Pedatella et al. 2021 ).
Since COSMIC and satellite altimetry measurements have different spatial and temporal distributions, the collocated comparison pairs should be matched with the following criteria: difference within \(2.5^\circ\) and a time span within 20 min. The distribution of matched footprints during an entire Jason-1 cycle (approximately 10 days) is depicted in Fig. 4 . Due to differences in satellite designs, the matched points are less distributed in the equatorial area or near the poles. Sentinel-3 operates at a similar orbit altitude as COSMIC-1 satellites, allowing for direct comparison of TEC. Nevertheless, the variance in orbit height between COSMIC-1 and Jason satellites (800 km and 1300 km, respectively) introduces systematic deviations in their TECs. To mitigate the influence of orbit differences, we extrapolated the COSMIC VTEC to the same altitude as the Jason observations using an exponential profiler. The electron density above the COSMIC satellite can be represented by
where \(h_{RO}\) is the orbit altitude of the COSMIC satellite, \(Ne\left( {h_{RO} } \right)\) is the electron density at \(h_{RO}\) and \(H_{P}\) is the plasmaspheric scale height, which is already calculated in our earlier work based on COSMIC electron density profiles and POD TEC of 8 years (Wu et al. 2021 ). Then, the VTEC with the same vertical range of Jason observations is obtained as follows:
in which \(h_{{{\text{ALT}}}}\) is the orbit altitude of the Jason satellite.
The collocated TEC measurements for COSMIC (black pentagram) and Jason-1 (colormap) during December 17 and 27, 2008, for an entire cycle of Jason passes
3 Intermission bias of satellite altimetry VTEC
The systematic bias issue in altimetry TEC observations has persisted since the launch of TOPEX/Poseidon in the 1990s and continues throughout subsequent Jason missions. Differences between satellites and various versions are caused by changes in models and algorithms applied in each round of processing. To investigate the discrepancy in Jason-1, Jason-2, Jason-3 and Sentinel-3 VTEC measurements, we selected available VTEC in the same period from 2008 to 2018 (encompassing an entire solar cycle) and calculated the mean TEC for each mission (see Fig. 5 ). The latest and reprocessed data versions were utilized where available, denoted as ‘E’ for Jason-1, ‘D’ for Jason-2 and ‘F’ for Jason-3 (Azpilicueta and Nava 2021 ), and the 2019 reprocessed version for Sentinel-3. According to Fig. 5 , the Jason-2 VTEC is higher than all other missions throughout the last solar cycle. During the overlap periods of Jason-1 and Jason-2 from 2008 to 2010, VTEC differences are relatively stable, ranging from 2.939 to 3.502 TECU. The differences seem less correlated with the solar activity variations, since the discrepancy from 2008 to 2013 is not enlarged even though the absolute TEC notably increases under the high solar activity conditions. From 2016 onwards, we have simultaneous measurements from Jason-2, Jason-3 and Sentinel-3. Results indicate that Jason-2 VTEC remains highest among all satellite series (Wielgosz et al. 2021 ). The mean VTEC of Sentinel-3 is higher than that of Jason-3 but lower than that of Jason-2. In the year 2017, the average discrepancy of Jason-2 and Jason-3 is 3.52 TECU, while decreasing to 1.67 TECU between Jason-2 and Sentinel-3.
The mean TEC of Jason-1, Jason-2 and Jason-3 and Sentinel-3. The mean F10.7 index in each year is shown by the dashed read line according to the right y-axis
The systematic bias observed in intermission comparisons for Jason satellites is coherent with results obtained from the official validation and cross-calibration activities. According to annual reports of different Jason missions and versions, the relationships between Jason-1/2/3 are as follows: \({\text{Jason2}}_{D} \gg {\text{Jason1}}_{E} { + 3}{\text{.4 TECU}}\) and \({\text{Jason3}}_{T} \gg {\text{Jason2}}_{D} { - 2}{\text{.5 TECU}}\) , where the subscript represents the data version (Roinard and Lievin 2017 ; Roinard and Michaud 2020 ). However, the TEC of the latest Jason-3 version, denoted as \({\text{Jason3}}_{F}\) , shows a mean difference of about 3.98 TECU lower than \({\text{Jason2}}_{D}\) . Azpilicueta and Nava ( 2021 ) concluded that \({\text{Jason1}}_{E}\) was the most accurate, with the least absolute bias of approximately 1 TECU relative to DORIS. Therefore, Jason-1 VTEC can be considered as a standard for mapping other datasets into \({\text{Jason1}}_{E}\) reference frame. In the following study, we will apply intermission correction based on the mean difference obtained from Fig. 5 to eliminate the systematic bias from the Jason and Sentinel TEC observations:
4 Comparison between satellite altimetry and RO VTEC
4.1 the daily differences between jason and cosmic vtec.
There is an overlap of approximately 6 months between Jason-1 and Jason-2 in 2008, as Jason-2 officially launched in July. We collected available TEC data from both Jason missions and matched them with COSMIC VTEC measurements. \({\text{VTEC}}_{{{\text{RO}}}}^{\prime }\) with topside compensation was utilized, assuming no remaining bias due to the altitude difference between the RO and SA satellites. The mean bias and standard deviation (STD) were calculated in each day based on all differences of coincident COSMIC and Jason TEC, as shown in Fig. 6 . According to the mean bias represented by the green line, the VTEC values of COSMIC and Jason-1 agree well with each other in 2008, with biases of less than 1 TECU. However, later this year, when directly comparing to the original Jason-2 VTEC (represented by the brown line), the RO results demonstrate a notable underestimation, with COSMIC-Jason residuals varying around -3 TECU. It confirms that the systematic bias between Jason missions can reach several TEC units. Given the systematic bias of altimetry missions studied in Sect. 3 , the correction application between Jason-1 and Jason-2 is demonstrated in panel (b), where ‘JASON2c’ is mapped to the Jason-1 reference frame by subtracting 3.4 TECU from the original Jason-2 values. Consequently, COSMIC demonstrates very good agreement with both Jason ionospheric products. The STD of the VTEC among different missions is not influenced by the systematic bias, and COSMIC differs from Jason by 1 ~ 3 TECU. Seasonal variations are not pronounced under LSA condition.
The variations in VTEC differences among COSMIC, Jason-1 and Jason-2 in 2008. Panels a and b are the daily mean bias, and ‘JASON2c’ indicates the corrected Jason-2 VTEC; panel c is the STD
Figure 7 shows similar results but under higher solar activities in 2014 when only Jason-2 data are available. The bias of COSMIC increases to -5 TECU during the equinox seasons in comparison with the Jason-2 VTEC. However, when the Jason-2 observations are corrected by systematic bias, the underestimation of COSMIC VTEC is significantly mitigated. Given the absolute accuracy of Jason-1 TEC measurements, COSMIC RO data can be considered high-precision ionospheric measurements under different solar activity conditions. The seasonal variations in STD are more pronounced in 2014, with greater deviations and higher discrepancies in spring and autumn, during the equinox seasons.
The variations in VTEC differences among COSMIC and Jason-2 in 2014. Panels a and b are the daily mean bias, and ‘JASON2c’ indicates the corrected Jason-2 VTEC; panel c is the STD
4.2 The characteristics of COSMIC-JASON bias
The collocated observations of COSMIC and Jason are relatively limited due to the sparse distribution of occultation measurements. A climatological comparison of COSMIC/Jason-retrieved VTEC is conducted by calculating seasonal mean differences between these two observations. The seasons are represented by March equinox (‘ME’) (March, April and May), June solstice (‘JS’) (June, July and August), September equinox (‘SE’) (September, October and November) and December solstice (‘DS’) (January, February and December) in this study. Figures 8 and 9 illustrate the global distribution of VTEC residuals (COSMIC-Jason) for 2008 and 2014, respectively. The quarterly COSMIC and Jason TECs are averaged over each grid of 5° latitude and 10° longitude. The difference is determined by COSMIC grid mean value minus Jason’s. During the selected daytime periods (LT 12–16) in the LSA year (Fig. 8 ), the VTEC of COSMIC is evidently overestimated in the equatorial and mid-latitude areas, along with the magnetic inclination field lines. The crest-trough-like structures are identified in most seasons, which may be attributed to the retrieval errors of RO observations. When the equatorial ionization anomaly (EIA) is well developed around noon, the Abel retrieval method tends to overestimate the electron density to the north and south of the EIA crests ( \(\pm 30^\circ\) ~ \(\pm 50^\circ\) ) and at the geomagnetic equator, while underestimates the electron density in the region surrounding the troughs ( \(\pm 10^\circ\) ~ \(\pm 30^\circ\) ). This phenomenon results in three pseudo peaks and two depletions along the magnetic inclination lines (Wu et al. 2019 ; Yue et al. 2010 ). Moreover, the VTEC residuals demonstrate seasonal variations consistent with EIA evolution, presenting equinox asymmetry with larger crests and troughs in the spring equinox compared to the autumn equinox (Yue et al. 2015 ). During nighttime, the absolute differences decrease because the VTEC values are much lower than during daytime. In 2014, when the solar activity was more intensive, the VTEC difference increased with the growing strength of EIA crests and troughs according to Fig. 9 . The equinox asymmetry is much more predominant, showing greater differences in the March equinox season than in the September equinox. During LT 0 ~ 4, the COSMIC VTEC is generally less than the Jason results between \(\pm 30^\circ\) of geomagnetic latitude. Particularly, the discrepancy of VTEC illustrates minor hemispheric asymmetry without obvious degradation in the ocean-dominant Southern Hemisphere. This also proves that with the compensation of topside TEC above 800 km, COSMIC VTEC shows good agreement with the Jason-1 and Jason-2 measurement under both LSA and HSA conditions. It indicates the feasibility of leveling the vertical range of different LEO satellites introduced by Eq. ( 6 ).
The mean difference between COSMIC and Jason-1 VTEC in 2008, LT 12–16 and LT 0–4. The white dashed and solid lines are magnetic inclination contour lines of the \(20^\circ\) interval
Same as in Fig. 8 but for 2014. The corrected Jason-2 data are used
4.3 The comparison among Jason, Sentinel and COSMIC VTEC
The comparison involving the Sentinel series was conducted in 2017, during which COSMIC-1 data were still available, although in lower volume. Given that Sentinel-3A shares similar orbit altitude with COSMIC, so no compensation was necessary when comparing these two datasets. Systematic bias was excluded by applying the calibration summarized in Eq. ( 7 ), which also corrected Jason-2 and Jason-3 TEC measurements accordingly. Consequently, all satellite altimetry ionospheric products were mapped into the relatively accurate frame of Jason-1 version ‘E.’ Figure 10 shows the histograms of each dataset compared to COSMIC VTEC during both daytime and nighttime. The colocation criterion was set at 2° of latitude, 6° of longitude and within 30 min. The criterion is slightly less strict than before to accommodate more matched observations for statistical analysis. According to Fig. 10 , the agreement between COSMIC and Jason series is robust, with a correlation coefficient exceeding 0.9 during daytime. The mean biases of COSMIC and corrected Jason VTEC are 0.55 and 0.75 TECU, respectively, during daytime, and decrease to 0.27 and 0.25 TECU, respectively, at nighttime. The average difference between COSMIC and corrected Sentinel-3 TEC is about -0.5 TECU, with the STD being larger than the Jason comparison, especially at night. In conclusion, the mean bias and STD derived from collocated RO and SA VTEC measurements affirm the comparable accuracy of these two datasets.
The comparison of VTEC between COSMIC, Jason and Sentinel-3 in 2017. The corrected Jason and Sentinel results are used (denoted as ‘JASON2c,’ ‘JASON2c’ and ‘SENTINEL3c’)
5 The global differences between COSMIC and GIM VTEC
Given that RO VTEC can be considered an accurate reference, we will now discuss the differences between COSMIC and GIM. A number of validation and evaluation studies have been performed since the launch in 2006. This section focuses on a detailed discussion of the ocean-land and hemispheric diversity in systematic bias between RO and ground-based GIM observations. The comparison is conducted during two distinct years, 2008 and 2014, to capture the scenarios of low and high solar activity. The daily mean and STD of the COSMIC-GIM VTEC difference are shown in Fig. 11 . The COSMIC VTEC has been adjusted to the same vertical height of GIM VTEC. Notable seasonal variations are observed in both years, with higher STDs during the spring and autumn equinoxes, and smaller deviations during the solstice seasons. In 2008, the bias varies between -0.5 and 0.5 TECU, with minor day-to-day fluctuations. The daily F10.7 indexes are generally stable, except for a few instances of elevated values during the March equinox. The VTEC STD ranges between 1.0 and 2.5 TECU, with the lowest deviations during the June solstice season. As solar activities intensify in 2014, both the bias and STD increase significantly, especially during the equinox seasons. The seasonal variations in the VTEC difference are more predominant, with the STD occasionally ranging from 3 TECU to as high as 10 TECU. Moreover, the STD exhibits similar monthly periodical variations with the F10.7 index under high solar activity conditions, especially during the solstice seasons.
The seasonal variations in VTEC differences between COSMIC and GIM ( \(VTEC_{RO}\) -GIM) in 2008 and 2014. The daily bias and STD are shown according to the left y-axis, and the F10.7 flux is plotted according to the right y-axis
The ionospheric variations and physical characteristics across different latitudes are distinctive; therefore, we further investigated the latitude-dependent performances of RO VTEC with respect to GIM. According to Chen et al. ( 2020 ), the quality of GIM VTEC differs in the Southern and Northern Hemispheres at different latitudes, due to the influence of the marine and land distribution, as well as the density of GNSS tracking stations. The IGS combined maps are used since they are slightly better than any of the individual IAAC maps (Hernández-Pajares et al. 2009 ). The residuals between COSMIC and GIM are discussed in several geomagnetic latitude bands, \(- 15^\circ\) ~ \(15^\circ\) , \(\pm 15^\circ\) ~ \(\pm 45^\circ\) and \(\pm 45^\circ\) ~ \(\pm 90^\circ\) , representing low, mid- and high latitudes, respectively. Additionally, the surface types of oceans and land are separated in the statistics.
Figure 12 illustrates the bias and STD of the VTEC residual between COSMIC and GIM across different geomagnetic latitudes and local time periods in 2008. The difference between RO and GIM data is more noticeable at low and mid-latitude regions and less discernible at higher latitudes. Generally, the bias of the VTEC between \(- 15^\circ N\) and \(15^\circ N\) is below 1.5 TECU, while the STD ranges from 1 TECU to over 2.5 TECU, varying with local time. The absolute VTECs at equatorial areas are greater due to the enhanced solar radiation and the equatorial ionization anomaly. Diurnal variations are independent of latitude, with lower bias and STD observed during nighttime and higher values during daytime globally. Notably, disparities between the Northern and Southern Hemispheres are apparent, particularly in terms of VTEC bias, with residuals in mid-latitudinal areas escalating in the Southern Hemisphere. Given that the spaceborne technique is not constrained by the spatial–temporal distribution of measurements, the discrepancy between hemispheres may be attributed to the degradation in accuracy of GIM in the Southern Hemisphere, which is primarily ocean-dominated and experiences a significant reduction in ground-based stations. Generally, the STD of land areas is smaller than that of oceans. Figure 13 demonstrates similar findings but under high solar activity conditions. The equatorial STD increases to 8 TECU around noon and even higher at mid-latitudes. The VTEC residuals are smaller at high latitudes, especially in the Northern Hemisphere, with a bias less than − 1 TECU. The hemispheric asymmetry in accuracy is exacerbated, and the disparity in STD between ocean and land samples increases significantly, occasionally reaching 1 ~ 2 TECU during daytime.
The bias and STD of the VTEC residual between COSMIC and GIM at different geomagnetic latitudes and local time periods in 2008. The ‘Indian red’ and ‘dark blue’ bars represent the statistics over oceans and lands, respectively
Same as in Fig. 12 but for 2014
A summary is presented in Table 1 for the selected years 2008 and 2014, along with 2017, when Sentinel results are included. It compares the VTEC from collocated observations with same vertical coverage from RO, altimetry and GNSS data, calculating the mean bias and STD. \({\text{VTEC}}_{{{\text{RO}}}}^{\prime }\) and \({\text{VTEC}}_{{{\text{RO}}}}\) are defined in Eqs. ( 6 ) and ( 3 ), respectively. For GIM, analyses are conducted separately for the Northern and Southern Hemispheres. The absolute discrepancy between COSMIC and Jason is minimal, with the STD increasing from 1.7 TECU to approximately 5.2 TECU with higher solar activity level. The accuracy of GIM VTEC in the Northern Hemisphere is comparable to spaceborne results, while it degrades by approximately 0.7 TECU in 2008 and 1.7 TECU in 2014 in the Southern Hemisphere. The bias and STD of SA and RO data slightly increase in 2017. With appropriate calibration between different missions, SA and COSMIC ionospheric observations exhibit good agreement, as do the GNSS observations in the areas with dense tracking stations. However, GIM VTEC in the Southern Hemisphere is less reliable, necessitating the consideration of weights and variances in data combination and assimilation based on latitude.
6 The ionospheric characteristics observed by different techniques
To investigate the ionospheric characteristics reflected by different observations, we analyzed the VTEC distributions of Jason, COSMIC and GIM during daytime periods (LT 12–16) in 2008, as well as in Fig. 15 for the year 2014. The four seasons are defined in the same way as in Sect. 4.2 . According to the daytime TEC maps shown in Fig. 14 , the EIA phenomenon (denoted as zone ‘A’)—symbolized by the northern and southern crests and equatorial trough structures lying along the geomagnetic latitudes—is most prominently observed in the Jason VTEC. The development of EIA demonstrates specific annual and semi-annual variations driven by the plasma composition variations (Qian et al. 2013 ), presenting stronger strength during equinoxes than solstices, and a hemispheric asymmetry, especially during solstice seasons. The March equinox of 2008 displays more pronounced EIA structures compared to the September equinox, with the northern summer asymmetry being more noticeable than the winter hemisphere. The crests in the northern hemisphere are more discernible across all maps during June solstices in both Figs. 14 and 15 .
The global distribution of VTEC during LT 12–16 by Jason, COSMIC and GIM, respectively, in 2008. The white solid and dashed lines are magnetic inclination contour lines of interval \(20^\circ\) ; the thick black solid curves are the contour lines of zero magnetic declination
Same as Fig. 14 , but for 2014 during LT 12–16
Figures 16 and 17 reveal more complicated ionospheric characteristics during nighttime periods (LT 0 ~ 4). Notably, the VTEC is suddenly enhanced at mid-latitudes around the southeastern Pacific Ocean and southwestern Atlantic Ocean (denoted as zone ‘B’ in Fig. 16 ). This phenomenon, popularly known as the Weddell Sea anomaly (WSA), is best demonstrated in the Jason and COSMIC VTEC maps in 2008. It is particularly prominent in longitude sectors where the dip equator shifts farthest toward the geographic pole. The spaceborne VTEC captures this structure in all seasons except for the June solstice under LSA conditions. Conversely, under HSA condition, the WSA is recognizable in the December solstice in all datasets but is barely visible in other seasons. The western boundary of the WSA is limited by contours of zero magnetic declination, depicted by thick black curves in the panels (Burns et al. 2008 ). While the COSMIC observations delineate the western boundary most accurately, the other two datasets exhibit some leakage beyond the magnetic declination line. Nevertheless, the GIM VTEC is less satisfactory in terms of describing these features due to the inadequate number of stations in the oceanic areas, especially in 2008.
Same as Fig. 14 but for 2008 during LT 0–4
Same as Fig. 15 but for 2014 during LT 0–4
In Fig. 16 , the VTEC of GIM presents a striped enhancement at mid-latitude areas (denoted as zone ‘C’) along the magnetic inclination contour lines. This phenomenon, known as the mid-latitude electron density enhancement (MEDE), is less discernible in the COSMIC results and is mixed with the more pronounced WSA and MSNA in the Jason maps. As solar activity increases, the striped enhancement becomes barely visible in the satellite observations but remains detectable in the GIM VTEC, especially in the Pacific oceans (panel (c)-(l), Fig. 17 ). The longitudinal difference in MEDE is evident in 2008, characterized by distinct peaks and troughs to the north/south of the EIA structure in the Pacific Ocean and less strength between \(60^\circ W\) and \(90^\circ E\) . The occurrence of MEDE is attributed to the \({\varvec{E}} \times {\varvec{B}}\) drift, neutral wind and ionosphere–plasmasphere plasma flow. Although Rajesh et al. ( 2016 ) proposed that the MEDE occurs at all times of the day and is even more frequent during daytime in some cases, we are not able to confirm its existence during LT 12–16 in this study based on the displayed TEC maps.
Another distinction between spaceborne and ground-based observations is the Wave Number Four (WN4) pattern (C. H. Lin et al. 2007 ). Longitudinal wave-like structures are prominent at night, as shown in Fig. 16 (the ‘D’ zone). There are stronger electron content peaks in EIA zones at South America, Africa and Southeast Asia regions. The COSMIC TEC maps outperform Jason in depicting this feature, benefitting from the average coverage of RO data. However, the GIM VTEC is disadvantageous in reflecting delicate longitudinal variations. During daytime, WN4 pattern is still discernable in Fig. 14 for Jason and COSMIC observations at low and mid-latitude regions but is hardly identified in the GIM VTEC maps.
In the mid-latitude areas of \(40^\circ\) ~ \(60^\circ\) (geomagnetic latitude), the wave-2/wave-1 longitudinal structure is predominant, especially in the Southern Hemisphere during both daytime and nighttime (denoted as zone ‘E’ in Figs. 15 and 17 ). In Fig. 15 , the VTEC values in the southern longitude sector of \(45^\circ W\) ~ \(135^\circ E\) are significantly increased, while the left longitudes generally experience a decrease. In Fig. 17 , the wave-like structure is totally antiphase with its daytime counterpart. The longitudinal west–east difference is intensified when the WSA occurs during the December solstice (panels (j)(k)(l)). Moreover, the longitudinal variations roughly coincide with changes in the magnetic declination sign, as indicated by the contours of zero declination. The phase of the wave-1 structure is opposite in the Northern and Southern Hemispheres and more prominent in the southern half and in HSA years. The possible mechanisms of the longitudinal variations in mid- and subauroral regions include thermospheric zonal winds, magnetic field geometry, pressure gradient, ion drag, viscous force and Coriolis force (Rajesh et al. 2016 ; Wang and Lühr 2016 ). For such large-scale longitudinal variation of ionosphere, GIM demonstrates comparable performance with SA and RO observations.
7 Conclusions
This study primarily investigates the differences among the RO-derived integrated VTEC, GIM VTEC and SA TEC measurements. COSMIC data were reprocessed by an improved Abel inversion method developed earlier to reduce the retrieval errors brought by spherical symmetric assumption. Comparisons were conducted for the years 2008 and 2014 to assess the impact of solar activities. The key findings are summarized as follows:
Systematic differences between Jason series and Sentinel-3 remain relatively stable across varying solar activity levels. Jason-1 VTEC serves as a standard reference, facilitating the mapping of other datasets into its frame through specific calibration for different data versions;
COSMIC and Jason VTEC exhibit good agreement with minor biases throughout the year. The STDs are approximately ~ 2 TECU and 5 TECU in 2008 and 2014, respectively. Discrepancy between SA and RO TEC increases during pronounced EIA structures at low latitudes, resulting from enhanced ionospheric horizontal gradients and degradation in RO retrieval. Jason and Sentinel series have comparable accuracy with COSMIC in 2017;
Deviations between compensated COSMIC and GIM VTEC demonstrate significant seasonal variations, with greater discrepancies during the spring and autumn equinoxes and reduced discrepancies during the solstices. STD ranges from 1.0 to 2.5 TECU in 2008 and increases dramatically to 3.0 ~ 10.0 TECU during periods of higher solar activities. GIM VTEC exhibits hemispheric asymmetry by latitude, indicating accuracy degradation in ocean-dominant regions;
EIA crests and troughs are well observed in Jason VTEC maps. Regional enhancements of WSA and longitudinal variations of WN4 in the ionosphere are better performed in Jason and COSMIC observations compared to GIM. COSMIC TEC maps outperform Jason in revealing the WN4 feature due to the even coverage of RO data over lands and oceans;
Spaceborne satellite measurements have advantages in reproducing delicate longitudinal features over the ground-based technique, except for MEDE, which is more pronounced in GIM. Wave-2/wave-1 structures are especially predominant in the Southern Hemisphere at mid-latitudinal and subauroral areas and are well represented well by all techniques.
The validation of multi-source ionospheric observations is fundamental for the combination and model construction from different datasets in the future work. Discrepancies among GNSS, radio occultation and SA satellite observations are associated with various factors, including the station locations, data retrieval errors, spatial and temporal variations and solar activity conditions. Our comprehensive investigation aimed to minimize intermission bias and establish systematic differences under the same vertical coverage. The good agreement observed among different datasets indicates the feasibility of leveling the vertical range between satellites using methods introduced in this study. This research could serve as a reference for determining observational covariance and weight matrices in data assimilation and ingestion studies.
Data availability
The COSMIC data used in this study are provided by the UCAR COSMIC Data Analysis and Archive Center at https://data.cosmic.ucar.edu/gnss-ro/ . The IGS GIM products can be downloaded from the NASA Crustal Dynamics Data Information System at https://cddis.nasa.gov/archive/gnss/products/ionex/ . The Jason altimeter products were produced and distributed by Aviso + ( https://www.aviso.altimetry.fr/ ), as part of the Ssalto ground processing segment. The Sentinel-3 marine data are organized by EUMESAT and available at https://data.eumetsat.int/ .
Aa E, Zhang S (2022) 3-D regional ionosphere imaging and SED reconstruction with a new TEC-based ionospheric data assimilation system (TIDAS). Sp Weather. https://doi.org/10.1029/2022SW003055
Article Google Scholar
Alizadeh MM, Schuh H, Todorova S, Schmidt M (2011) Global Ionosphere Maps of VTEC from GNSS, satellite altimetry, and formosat-3/COSMIC data. J Geod 85:975–987. https://doi.org/10.1007/s00190-011-0449-z
Anthes RA, Bernhardt PA, Chen Y et al (2008) The COSMIC/FORMOSAT-3 mission: early results. Bull Am Meteorol Soc 89:313. https://doi.org/10.1175/BAMS-89-3-313
Azpilicueta F, Nava B (2021) On the TEC bias of altimeter satellites. J Geod 95:1–15. https://doi.org/10.1007/s00190-021-01564-y
Brunini C, Meza A, Bosch W (2005) Temporal and spatial variability of the bias between TOPEX- and GPS-derived total electron content. J Geod 79:175–188. https://doi.org/10.1007/s00190-005-0448-z
Burns AG, Zeng Z, Wang W et al (2008) Behavior of the F2 peak Ionosphere over the South Pacific at dusk during quiet summer conditions from COSMIC data. J Geophys Res Sp Phys 113:1–9. https://doi.org/10.1029/2008JA013308
Bust GS, Garner TW, Gaussiran TL (2004) Ionospheric data assimilation three-dimensional (IDA3D): a global, multisensor, electron density specification algorithm. J Geophys Res Sp Phys 109:1–14. https://doi.org/10.1029/2003JA010234
Article CAS Google Scholar
Chen P, Yao Y, Yao W (2017) Global ionosphere maps based on GNSS, satellite altimetry, radio occultation and DORIS. GPS Solut 21:639–650. https://doi.org/10.1007/s10291-016-0554-9
Chen P, Liu H, Ma Y, Zheng N (2020) Accuracy and consistency of different global ionospheric maps released by IGS ionosphere associate analysis centers. Adv Sp Res 65:163–174. https://doi.org/10.1016/j.asr.2019.09.042
Cherniak I, Zakharenkova I (2019) Evaluation of the IRI-2016 and NeQuick electron content specification by COSMIC GPS radio occultation, ground-based GPS and Jason-2 joint altimeter/GPS observations. Adv Sp Res 63:1845–1859. https://doi.org/10.1016/j.asr.2018.10.036
Dettmering D, Heinkelmann R, Schmidt M (2011a) Systematic differences between VTEC obtained by different space-geodetic techniques during CONT08. J Geod 85:443–451. https://doi.org/10.1007/s00190-011-0473-z
Dettmering D, Schmidt M, Heinkelmann R, Seitz M (2011b) Combination of different space-geodetic observations for regional ionosphere modeling. J Geod 85:989–998. https://doi.org/10.1007/s00190-010-0423-1
Foelsche U, Kirchengast G (2002) A simple “geometric” mapping function for the hydrostatic delay at radio frequencies and assessment of its performance. Geophys Res Lett 29:1473. https://doi.org/10.1029/2001GL013744
Guo P, Wu M, Xu T et al (2015) An abel inversion method assisted by background model for GPS ionospheric radio occultation data. J Atmos Solar-Terrestrial Phys 123:71–81. https://doi.org/10.1016/j.jastp.2014.12.008
Hernández-Pajares M, Juan JM, Sanz J et al (2009) The IGS VTEC maps: a reliable source of ionospheric information since 1998. J Geod 83:263–275. https://doi.org/10.1007/s00190-008-0266-1
Hernández-Pajares M, Roma-Dollase D, Krankowski A et al (2017) Methodology and consistency of slant and vertical assessments for ionospheric electron content models. J Geod 91:1405–1414. https://doi.org/10.1007/s00190-017-1032-z
Hirokawa R, Fernández-Hernández I (2020) Open format specifications for PPP/PPP-RTK services: overview and interoperability assessment. Proc 33rd Int Tech Meet Satell Div Inst Navig. https://doi.org/10.33012/2020.17620
Imel A (1994) Evaluation of the TOPEX/POSEIDON dual-frequency ionosphere correction. J Geophys Res 99:24895–24906. https://doi.org/10.1029/94JC01869
Jin S, Gao C, Yuan L et al (2021) Long-term variations of plasmaspheric total electron content from topside GPS observations on LEO satellites. Remote Sens 13:1–15. https://doi.org/10.3390/rs13040545
Klobuchar JA (1987) Ionospheric time-delay algorithm for single-frequency GPS users. IEEE Trans Aerosp Electron Syst 23:325–331. https://doi.org/10.1109/TAES.1987.310829
Lafon T, Parisot F (1998) The Jason-1 satellite design and development status. In: Proceedings of the 4th International Symposium on Small Satellites Systems and Services, Sept. 14–18, 1998, Antibes Juan les Pins, France
Li W, Huang L, Zhang S, Chai Y (2019) Assessing global ionosphere TEC maps with satellite altimetry and ionospheric radio occultation observations. Sensors (Switzerland) 19:1–13. https://doi.org/10.3390/s19245489
Li Z, Wang N, Hernández-Pajares M et al (2020) IGS real-time service for global ionospheric total electron content modeling. J Geod. https://doi.org/10.1007/s00190-020-01360-0
Lin CH, Wang W, Hagan ME et al (2007) Plausible effect of atmospheric tides on the equatorial ionosphere observed by the FORMOSAT-3/COSMIC: three-dimensional electron density structures. Geophys Res Lett 34:1–5. https://doi.org/10.1029/2007GL029265
Pedatella NM, Zakharenkova I, Braun JJ et al (2021) Processing and validation of FORMOSAT-7/COSMIC-2 GPS total electron content observations. Radio Sci. https://doi.org/10.1029/2021rs007267
Qian L, Burns AG, Solomon SC, Wang W (2013) Annual/semiannual variation of the ionosphere. Geophys Res Lett 40:1928–1933. https://doi.org/10.1002/grl.50448
Rajesh PK, Liu JY, Balan N et al (2016) Morphology of midlatitude electron density enhancement using total electron content measurements. J Geophys Res A Sp Phys 121:1503–1517. https://doi.org/10.1002/2015JA022251
Roinard H, Lievin M (2017) Jason-2 validation and cross-calibration activities (Annual report 2016). SALP-RP-MA-EA-23058-CLS, 1rev 2
Roinard H, Michaud L (2020) Jason-3 validation and cross calibration activities (Annualreport 2019). SALP-RP-MA-EA-23399-CLS, Issue 1.1
Roma-Dollase D, Hernández-Pajares M, Krankowski A et al (2018) Consistency of seven different GNSS global ionospheric mapping techniques during one solar cycle. J Geod 92:691–706. https://doi.org/10.1007/s00190-017-1088-9
Schreiner WS, Weiss JP, Anthes RA et al (2020) COSMIC-2 radio occultation constellation: first results. Geophys Res Lett 47:1–7. https://doi.org/10.1029/2019GL086841
Wang H, Lühr H (2016) Longitudinal variation in zonal winds at subauroral regions: possible mechanisms. J Geophys Res A Sp Phys 121:745–763. https://doi.org/10.1002/2015JA022086
Wielgosz P, Milanowska B, Krypiak-Gregorczyk A, Jarmołowski W (2021) Validation of GNSS-derived global ionosphere maps for different solar activity levels: case studies for years 2014 and 2018. GPS Solut. https://doi.org/10.1007/s10291-021-01142-x
Wu MJ, Guo P, Fu NF et al (2018) Improvement of the IRI model using F2 layer parameters derived from GPS/COSMIC radio occultation observations. J Geophys Res Sp Phys 123:9815–9835. https://doi.org/10.1029/2018JA026092
Wu MJ, Guo P, Fu NF et al (2019) Evaluation of abel inversion method assisted by an improved IRI model. J Geophys Res Sp Phys 124:5995–6011. https://doi.org/10.1029/2019JA026880
Wu MJ, Xu X, Li F et al (2021) Plasmaspheric scale height modeling based on COSMIC radio occultation data. J Atmos Solar-Terrestrial Phys. https://doi.org/10.1016/j.jastp.2021.105555
Yao Y, Liu L, Kong J, Zhai C (2018) Global ionospheric modeling based on multi-GNSS, satellite altimetry, and Formosat-3/COSMIC data. GPS Solut. https://doi.org/10.1007/s10291-018-0770-6
Yue X, Schreiner WS, Lei J et al (2010) Error analysis of abel retrieved electron density profiles from radio occultation measurements. Ann Geophys 28:217–222. https://doi.org/10.5194/angeo-28-217-2010
Yue X, Schreiner WS, Kuo YH, Lei J (2015) Ionosphere equatorial ionization anomaly observed by GPS radio occultations during 2006–2014. J Atmos Solar-Terrestrial Phys 129:30–40. https://doi.org/10.1016/j.jastp.2015.04.004
Zhong J, Lei J, Dou X, Yue X (2016) Assessment of vertical TEC mapping functions for space-based GNSS observations. GPS Solut 20:353–362. https://doi.org/10.1007/s10291-015-0444-6
Download references
Acknowledgements
The authors are very grateful for the open access provided by the UCAR COSMIC data analysis and archive center for all the COSMIC data, the AVISO CNES data center and the NASA Crustal Dynamics Data Information System for the Jason and IGS GIM products. Thanks to the data access services offered by EUMESAT about the Sentinel-3 marine data.
This research was funded by the National Key R&D Program of China, grant number 2020YFA0713501, and the National Natural Science Foundation of China (12273094, 12273093 and 42005099), the Natural Science Foundation of Shanghai (23ZR1473800) and the Youth Innovation Promotion Association CAS.
Author information
Authors and affiliations.
Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai, 200030, China
M. J. Wu, P. Guo, X. Ma, J. C. Xue & X. G. Hu
Advance Research Institute, Taizhou University, Taizhou, 318000, China
Institute of Meteorology and Oceanography, National University of Defense Technology, Nanjing, 410000, China
Shanghai Meteorological Bureau, Shanghai, 200030, China
You can also search for this author in PubMed Google Scholar
Contributions
M. J. Wu designed the research and performed the coding and calculation work in this manuscript; M. J. Wu and P. Guo contributed by validating the results and writing the draft; X. Ma and J. C. Xue were responsible for the satellite altimetry and GNSS data collection and analysis; M. Liu contributed to the revision work and X. G. Hu contributed to the conceptual discussions and editing of the manuscript.
Corresponding author
Correspondence to M. J. Wu .
Ethics declarations
Conflict of interest.
The authors declare that they have no conflicts of interest.
Additional information
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
About this article
Wu, M.J., Guo, P., Ma, X. et al. Differences among the total electron content derived by radio occultation, global ionospheric maps and satellite altimetry. J Geod 98 , 82 (2024). https://doi.org/10.1007/s00190-024-01893-8
Download citation
Received : 13 September 2023
Accepted : 25 August 2024
Published : 11 September 2024
DOI : https://doi.org/10.1007/s00190-024-01893-8
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Ionospheric radio occultation
- Global ionospheric maps
- Satellite altimetry TEC
- Systematic bias
- Find a journal
- Publish with us
- Track your research
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
Protecting against researcher bias in secondary data analysis: challenges and potential solutions
Jessie r. baldwin.
1 Department of Clinical, Educational and Health Psychology, Division of Psychology and Language Sciences, University College London, London, WC1H 0AP UK
2 Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Jean-Baptiste Pingault
Tabea schoeler, hannah m. sallis.
3 MRC Integrative Epidemiology Unit at the University of Bristol, Bristol Medical School, University of Bristol, Bristol, UK
4 School of Psychological Science, University of Bristol, Bristol, UK
5 Centre for Academic Mental Health, Population Health Sciences, University of Bristol, Bristol, UK
Marcus R. Munafò
6 NIHR Biomedical Research Centre, University Hospitals Bristol NHS Foundation Trust and University of Bristol, Bristol, UK
Analysis of secondary data sources (such as cohort studies, survey data, and administrative records) has the potential to provide answers to science and society’s most pressing questions. However, researcher biases can lead to questionable research practices in secondary data analysis, which can distort the evidence base. While pre-registration can help to protect against researcher biases, it presents challenges for secondary data analysis. In this article, we describe these challenges and propose novel solutions and alternative approaches. Proposed solutions include approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) help ensure that pre-registered analyses will be appropriate for the data, and (4) address difficulties arising from reduced analytic flexibility in pre-registration. For each solution, we provide guidance on implementation for researchers and data guardians. The adoption of these practices can help to protect against researcher bias in secondary data analysis, to improve the robustness of research based on existing data.
Introduction
Secondary data analysis has the potential to provide answers to science and society’s most pressing questions. An abundance of secondary data exists—cohort studies, surveys, administrative data (e.g., health records, crime records, census data), financial data, and environmental data—that can be analysed by researchers in academia, industry, third-sector organisations, and the government. However, secondary data analysis is vulnerable to questionable research practices (QRPs) which can distort the evidence base. These QRPs include p-hacking (i.e., exploiting analytic flexibility to obtain statistically significant results), selective reporting of statistically significant, novel, or “clean” results, and hypothesising after the results are known (HARK-ing [i.e., presenting unexpected results as if they were predicted]; [ 1 ]. Indeed, findings obtained from secondary data analysis are not always replicable [ 2 , 3 ], reproducible [ 4 ], or robust to analytic choices [ 5 , 6 ]. Preventing QRPs in research based on secondary data is therefore critical for scientific and societal progress.
A primary cause of QRPs is common cognitive biases that affect the analysis, reporting, and interpretation of data [ 7 – 10 ]. For example, apophenia (the tendency to see patterns in random data) and confirmation bias (the tendency to focus on evidence that is consistent with one’s beliefs) can lead to particular analytical choices and selective reporting of “publishable” results [ 11 – 13 ]. In addition, hindsight bias (the tendency to view past events as predictable) can lead to HARK-ing, so that observed results appear more compelling.
The scope for these biases to distort research outputs from secondary data analysis is perhaps particularly acute, for two reasons. First, researchers now have increasing access to high-dimensional datasets that offer a multitude of ways to analyse the same data [ 6 ]. Such analytic flexibility can lead to different conclusions depending on the analytical choices made [ 5 , 14 , 15 ]. Second, current incentive structures in science reward researchers for publishing statistically significant, novel, and/or surprising findings [ 16 ]. This combination of opportunity and incentive may lead researchers—consciously or unconsciously—to run multiple analyses and only report the most “publishable” findings.
One way to help protect against the effects of researcher bias is to pre-register research plans [ 17 , 18 ]. This can be achieved by pre-specifying the rationale, hypotheses, methods, and analysis plans, and submitting these to either a third-party registry (e.g., the Open Science Framework [OSF]; https://osf.io/ ), or a journal in the form of a Registered Report [ 19 ]. Because research plans and hypotheses are specified before the results are known, pre-registration reduces the potential for cognitive biases to lead to p-hacking, selective reporting, and HARK-ing [ 20 ]. While pre-registration is not necessarily a panacea for preventing QRPs (Table (Table1), 1 ), meta-scientific evidence has found that pre-registered studies and Registered Reports are more likely to report null results [ 21 – 23 ], smaller effect sizes [ 24 ], and be replicated [ 25 ]. Pre-registration is increasingly being adopted in epidemiological research [ 26 , 27 ], and is even required for access to data from certain cohorts (e.g., the Twins Early Development Study [ 28 ]). However, pre-registration (and other open science practices; Table Table2) 2 ) can pose particular challenges to researchers conducting secondary data analysis [ 29 ], motivating the need for alternative approaches and solutions. Here we describe such challenges, before proposing potential solutions to protect against researcher bias in secondary data analysis (summarised in Fig. 1 ).
Limitations in the use of pre-registration to address QRPs
Limitation | Example |
---|---|
Pre-registration may not prevent selective reporting/outcome switching | The COMPare Trials Project [ ] assessed outcome switching in clinical trials published in the top 5 medical journals between October 2015 and January 2016. Among 67 clinical trials, on average, each trial reported 58.2% of its specified outcomes, and silently added 5.3 new outcomes |
Pre-registration may be performed retrospectively after the results are known | Mathieu et al. [ ] assessed 323 clinical trials published in 2008 in the top 10 medical journals. 45 trials (13.9%) were registered after the completion of the study |
Deviations from pre-registered protocols are common | Claesen et al. [ ] assessed all pre-registered articles published in Psychological Science and between February 2015 and November 2017. All 23 articles deviated from the pre-registration, and only one study disclosed the deviation |
Pre-registration may not improve the credibility of hypotheses | Rubin [ ] and Szollosi, Kellen [ ] argue that formulating hypotheses post-hoc (HARK-ing) is not problematic if they are deduced from pre-existing theory or evidence, rather than induced from the current results |
Challenges and potential solutions regarding sharing pre-existing data
Challenge | Potential solutions |
---|---|
: Many datasets cannot be publicly shared because of ethical and legal requirements | Share a synthetic dataset (a simulated dataset which mimics an original dataset by preserving its statistical properties and associations between variables). For a tutorial, see Quintana [ ] |
Provide specific instructions on how data can be accessed and links to codebooks/data dictionaries with variable information [ ] | |
If different researchers conduct similar statistical tests on a dataset and do not correct for multiple testing, this increases the risk of false positives [ ] | Test whether findings replicate in independent samples, as the chance of two identical false positives occurring in independent samples is small |
Ensure that the research question is distinct from prior studies on the given dataset, to help ensure that proposed analyses are part of a different statistical family. Multiple analyses on a single dataset will not lead to false positives if the analyses are part of different statistical families |
Challenges in pre-registering secondary data analysis and potential solutions (according to researcher motivations). Note : In the “Potential solution” column, blue boxes indicate solutions that are researcher-led; green boxes indicate solutions that should be facilitated by data guardians
Challenges of pre-registration for secondary data analysis
Prior knowledge of the data.
Researchers conducting secondary data analysis commonly analyse data from the same dataset multiple times throughout their careers. However, prior knowledge of the data increases risk of bias, as prior expectations about findings could motivate researchers to pursue certain analyses or questions. In the worst-case scenario, a researcher might perform multiple preliminary analyses, and only pursue those which lead to notable results (perhaps posting a pre-registration for these analyses, even though it is effectively post hoc). However, even if the researcher has not conducted specific analyses previously, they may be biased (either consciously or subconsciously) to pursue certain analyses after testing related questions with the same variables, or even by reading past studies on the dataset. As such, pre-registration cannot fully protect against researcher bias when researchers have previously accessed the data.
Research may not be hypothesis-driven
Pre-registration and Registered Reports are tailored towards hypothesis-driven, confirmatory research. For example, the OSF pre-registration template requires researchers to state “specific, concise, and testable hypotheses”, while Registered Reports do not permit purely exploratory research [ 30 ], although a new Exploratory Reports format now exists [ 31 ]. However, much research involving secondary data is not focused on hypothesis testing, but is exploratory, descriptive, or focused on estimation—in other words, examining the magnitude and robustness of an association as precisely as possible, rather than simply testing a point null. Furthermore, without a strong theoretical background, hypotheses will be arbitrary and could lead to unhelpful inferences [ 32 , 33 ], and so should be avoided in novel areas of research.
Pre-registered analyses are not appropriate for the data
With pre-registration, there is always a risk that the data will violate the assumptions of the pre-registered analyses [ 17 ]. For example, a researcher might pre-register a parametric test, only for the data to be non-normally distributed. However, in secondary data analysis, the extent to which the data shape the appropriate analysis can be considerable. First, longitudinal cohort studies are often subject to missing data and attrition. Approaches to deal with missing data (e.g., listwise deletion; multiple imputation) depend on the characteristics of missing data (e.g., the extent and patterns of missingness [ 34 ]), and so pre-specifying approaches to dealing with missingness may be difficult, or extremely complex. Second, certain analytical decisions depend on the nature of the observed data (e.g., the choice of covariates to include in a multiple regression might depend on the collinearity between the measures, or the degree of missingness of different measures that capture the same construct). Third, much secondary data (e.g., electronic health records and other administrative data) were never collected for research purposes, so can present several challenges that are impossible to predict in advance [ 35 ]. These issues can limit a researcher’s ability to pre-register a precise analytic plan prior to accessing secondary data.
Lack of flexibility in data analysis
Concerns have been raised that pre-registration limits flexibility in data analysis, including justifiable exploration [ 36 – 38 ]. For example, by requiring researchers to commit to a pre-registered analysis plan, pre-registration could prevent researchers from exploring novel questions (with a hypothesis-free approach), conducting follow-up analyses to investigate notable findings [ 39 ], or employing newly published methods with advantages over those pre-registered. While this concern is also likely to apply to primary data analysis, it is particularly relevant to certain fields involving secondary data analysis, such as genetic epidemiology, where new methods are rapidly being developed [ 40 ], and follow-up analyses are often required (e.g., in a genome-wide association study to further investigate the role of a genetic variant associated with a phenotype). However, this concern is perhaps over-stated – pre-registration does not preclude unplanned analyses; it simply makes it more transparent that these analyses are post hoc. Nevertheless, another understandable concern is that reduced analytic flexibility could lead to difficulties in publishing papers and accruing citations. For example, pre-registered studies are more likely to report null results [ 22 , 23 ], likely due to reduced analytic flexibility and selective reporting. While this is a positive outcome for research integrity, null results are less likely to be published [ 13 , 41 , 42 ] and cited [ 11 ], which could disadvantage researchers’ careers.
In this section, we describe potential solutions to address the challenges involved in pre-registering secondary data analysis, including approaches to (1) address bias linked to prior knowledge of the data, (2) enable pre-registration of non-hypothesis-driven research, (3) ensure that pre-planned analyses will be appropriate for the data, and (4) address potential difficulties arising from reduced analytic flexibility.
Challenge: Prior knowledge of the data
Declare prior access to data.
To increase transparency about potential biases arising from knowledge of the data, researchers could routinely report all prior data access in a pre-registration [ 29 ]. This would ideally include evidence from an independent gatekeeper (e.g., a data guardian of the study) stating whether data and relevant variables were accessed by each co-author. To facilitate this process, data guardians could set up a central “electronic checkout” system that records which researchers have accessed data, what data were accessed, and when [ 43 ]. The researcher or data guardian could then provide links to the checkout histories for all co-authors in the pre-registration, to verify their prior data access. If it is not feasible to provide such objective evidence, authors could self-certify their prior access to the dataset and where possible, relevant variables—preferably listing any publications and in-preparation studies based on the dataset [ 29 ]. Of course, self-certification relies on trust that researchers will accurately report prior data access, which could be challenging if the study involves a large number of authors, or authors who have been involved on many studies on the dataset. However, it is likely to be the most feasible option at present as many datasets do not have available electronic records of data access. For further guidance on self-certifying prior data access when pre-registering secondary data analysis studies on a third-party registry (e.g., the OSF), we recommend referring to the template by Van den Akker, Weston [ 29 ].
The extent to which prior access to data renders pre-registration invalid is debatable. On the one hand, even if data have been accessed previously, pre-registration is likely to reduce QRPs by encouraging researchers to commit to a pre-specified analytic strategy. On the other hand, pre-registration does not fully protect against researcher bias where data have already been accessed, and can lend added credibility to study claims, which may be unfounded. Reporting prior data access in a pre-registration is therefore important to make these potential biases transparent, so that readers and reviewers can judge the credibility of the findings accordingly. However, for a more rigorous solution which protects against researcher bias in the context of prior data access, researchers should consider adopting a multiverse approach.
Conduct a multiverse analysis
A multiverse analysis involves identifying all potential analytic choices that could justifiably be made to address a given research question (e.g., different ways to code a variable, combinations of covariates, and types of analytic model), implementing them all, and reporting the results [ 44 ]. Notably, this method differs from the traditional approach in which findings from only one analytic method are reported. It is conceptually similar to a sensitivity analysis, but it is far more comprehensive, as often hundreds or thousands of analytic choices are reported, rather than a handful. By showing the results from all defensible analytic approaches, multiverse analysis reduces scope for selective reporting and provides insight into the robustness of findings against analytical choices (for example, if there is a clear convergence of estimates, irrespective of most analytical choices). For causal questions in observational research, Directed Acyclic Graphs (DAGs) could be used to inform selection of covariates in multiverse approaches [ 45 ] (i.e., to ensure that confounders, rather than mediators or colliders, are controlled for).
Specification curve analysis [ 46 ] is a form of multiverse analysis that has been applied to examine the robustness of epidemiological findings to analytic choices [ 6 , 47 ]. Specification curve analysis involves three steps: (1) identifying all analytic choices – termed “specifications”, (2) displaying the results graphically with magnitude of effect size plotted against analytic choice, and (3) conducting joint inference across all results. When applied to the association between digital technology use and adolescent well-being [ 6 ], specification curve analysis showed that the (small, negative) association diminished after accounting for adequate control variables and recall bias – demonstrating the sensitivity of results to analytic choices.
Despite the benefits of the multiverse approach in addressing analytic flexibility, it is not without limitations. First, because each analytic choice is treated as equally valid, including less justifiable models could bias the results away from the truth. Second, the choice of specifications can be biased by prior knowledge (e.g., a researcher may choose to omit a covariate to obtain a particular result). Third, multiverse analysis may not entirely prevent selective reporting (e.g., if the full range of results are not reported), although pre-registering multiverse approaches (and specifying analytic choices) could mitigate this. Last, and perhaps most importantly, multiverse analysis is technically challenging (e.g., when there are hundreds or thousands of analytic choices) and can be impractical for complex analyses, very large datasets, or when computational resources are limited. However, this burden can be somewhat reduced by tutorials and packages which are being developed to standardise the procedure and reduce computational time [see 48 , 49 ].
Challenge: Research may not be hypothesis-driven
Pre-register research questions and conditions for interpreting findings.
Observational research arguably does not need to have a hypothesis to benefit from pre-registration. For studies that are descriptive or focused on estimation, we recommend pre-registering research questions, analysis plans, and criteria for interpretation. Analytic flexibility will be limited by pre-registering specific research questions and detailed analysis plans, while post hoc interpretation will be limited by pre-specifying criteria for interpretation [ 50 ]. The potential for HARK-ing will also be minimised because readers can compare the published study to the original pre-registration, where a-priori hypotheses were not specified.
Detailed guidance on how to pre-register research questions and analysis plans for secondary data is provided in Van den Akker’s [ 29 ] tutorial. To pre-specify conditions for interpretation, it is important to anticipate – as much as possible – all potential findings, and state how each would be interpreted. For example, suppose that a researcher aims to test a causal relationship between X and Y using a multivariate regression model with longitudinal data. Assuming that all potential confounders have been fully measured and controlled for (albeit a strong assumption) and statistical power is high, three broad sets of results and interpretations could be pre-specified. First, an association between X and Y that is similar in magnitude to the unadjusted association would be consistent with a causal relationship. Second, an association between X and Y that is attenuated after controlling for confounders would suggest that the relationship is partly causal and partly confounded. Third, a minimal, non-statistically significant adjusted association would suggest a lack of evidence for a causal effect of X on Y. Depending on the context of the study, criteria could also be provided on the threshold (or range of thresholds) at which the effect size would justify different interpretations [ 51 ], be considered practically meaningful, or the smallest effect size of interest for equivalence tests [ 52 ]. While researcher biases might still affect the pre-registered criteria for interpreting findings (e.g., toward over-interpreting a small effect size as meaningful), this bias will at least be transparent in the pre-registration.
Use a holdout sample to delineate exploratory and confirmatory research
Where researchers wish to integrate exploratory research into a pre-registered, confirmatory study, a holdout sample approach can be used [ 18 ]. Creating a holdout sample refers to the process of randomly splitting the dataset into two parts, often referred to as ‘training’ and ‘holdout’ datasets. To delineate exploratory and confirmatory research, researchers can first conduct exploratory data analysis on the training dataset (which should comprise a moderate fraction of the data, e.g., 35% [ 53 ]. Based on the results of the discovery process, researchers can pre-register hypotheses and analysis plans to formally test on the holdout dataset. This process has parallels with cross-validation in machine learning, in which the dataset is split and the model is developed on the training dataset, before being tested on the test dataset. The approach enables a flexible discovery process, before formally testing discoveries in a non-biased way.
When considering whether to use the holdout sample approach, three points should be noted. First, because the training dataset is not reusable, there will be a reduced sample size and loss of power relative to analysing the whole dataset. As such, the holdout sample approach will only be appropriate when the original dataset is large enough to provide sufficient power in the holdout dataset. Second, when the training dataset is used for exploration, subsequent confirmatory analyses on the holdout dataset may be overfitted (due to both datasets being drawn from the same sample), so replication in independent samples is recommended. Third, the holdout dataset should be created by an independent data manager or guardian, to ensure that the researcher does not have knowledge of the full dataset. However, it is straightforward to randomly split a dataset into a holdout and training sample and we provide example R code at: https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Holdout_script.md .
Challenge: Pre-registered analyses are not appropriate for the data
Use blinding to test proposed analyses.
One method to help ensure that pre-registered analyses will be appropriate for the data is to trial the analyses on a blinded dataset [ 54 ], before pre-registering. Data blinding involves obscuring the data values or labels prior to data analysis, so that the proposed analyses can be trialled on the data without observing the actual findings. Various types of blinding strategies exist [ 54 ], but one method that is appropriate for epidemiological data is “data scrambling” [ 55 ]. This involves randomly shuffling the data points so that any associations between variables are obscured, whilst the variable distributions (and amounts of missing data) remain the same. We provide a tutorial for how to implement this in R (see https://github.com/jr-baldwin/Researcher_Bias_Methods/blob/main/Data_scrambling_tutorial.md ). Ideally the data scrambling would be done by a data guardian who is independent of the research, to ensure that the main researcher does not access the data prior to pre-registering the analyses. Once the researcher is confident with the analyses, the study can be pre-registered, and the analyses conducted on the unscrambled dataset.
Blinded analysis offers several advantages for ensuring that pre-registered analyses are appropriate, with some limitations. First, blinded analysis allows researchers to directly check the distribution of variables and amounts of missingness, without having to make assumptions about the data that may not be met, or spend time planning contingencies for every possible scenario. Second, blinded analysis prevents researchers from gaining insight into the potential findings prior to pre-registration, because associations between variables are masked. However, because of this, blinded analysis does not enable researchers to check for collinearity, predictors of missing data, or other covariances that may be necessary for model specification. As such, blinded analysis will be most appropriate for researchers who wish to check the data distribution and amounts of missingness before pre-registering.
Trial analyses on a dataset excluding the outcome
Another method to help ensure that pre-registered analyses will be appropriate for the data is to trial analyses on a dataset excluding outcome data. For example, data managers could provide researchers with part of the dataset containing the exposure variable(s) plus any covariates and/or auxiliary variables. The researcher can then trial and refine the analyses ahead of pre-registering, without gaining insight into the main findings (which require the outcome data). This approach is used to mitigate bias in propensity score matching studies [ 26 , 56 ], as researchers use data on the exposure and covariates to create matched groups, prior to accessing any outcome data. Once the exposed and non-exposed groups have been matched effectively, researchers pre-register the protocol ahead of viewing the outcome data. Notably though, this approach could help researchers to identify and address other analytical challenges involving secondary data. For example, it could be used to check multivariable distributional characteristics, test for collinearity between multiple predictor variables, or identify predictors of missing data for multiple imputation.
This approach offers certain benefits for researchers keen to ensure that pre-registered analyses are appropriate for the observed data, with some limitations. Regarding benefits, researchers will be able to examine associations between variables (excluding the outcome), unlike the data scrambling approach described above. This would be helpful for checking certain assumptions (e.g., collinearity or characteristics of missing data such as whether it is missing at random). In addition, the approach is easy to implement, as the dataset can be initially created without the outcome variable, which can then be added after pre-registration, minimising burden on data guardians. Regarding limitations, it is possible that accessing variables in advance could provide some insight into the findings. For example, if a covariate is known to be highly correlated with the outcome, testing the association between the covariate and the exposure could give some indication of the relationship between the exposure and the outcome. To make this potential bias transparent, researchers should report the variables that they already accessed in the pre-registration. Another limitation is that researchers will not be able to identify analytical issues relating to the outcome data in advance of pre-registration. Therefore, this approach will be most appropriate where researchers wish to check various characteristics of the exposure variable(s) and covariates, rather than the outcome. However, a “mixed” approach could be applied in which outcome data is provided in scrambled format, to enable researchers to also assess distributional characteristics of the outcome. This would substantially reduce the number of potential challenges to be considered in pre-registered analytical pipelines.
Pre-register a decision tree
If it is not possible to access any of the data prior to pre-registering (e.g., to enable analyses to be trialled on a dataset that is blinded or missing outcome data), researchers could pre-register a decision tree. This defines the sequence of analyses and rules based on characteristics of the observed data [ 17 ]. For example, the decision tree could specify testing a normality assumption, and based on the results, whether to use a parametric or non-parametric test. Ideally, the decision tree should provide a contingency plan for each of the planned analyses, if assumptions are not fulfilled. Of course, it can be challenging and time consuming to anticipate every potential issue with the data and plan contingencies. However, investing time into pre-specifying a decision tree (or a set of contingency plans) could save time should issues arise during data analysis, and can reduce the likelihood of deviating from the pre-registration.
Challenge: Lack of flexibility in data analysis
Transparently report unplanned analyses.
Unplanned analyses (such as applying new methods or conducting follow-up tests to investigate an interesting or unexpected finding) are a natural and often important part of the scientific process. Despite common misconceptions, pre-registration does not permit such unplanned analyses from being included, as long as they are transparently reported as post-hoc. If there are methodological deviations, we recommend that researchers should (1) clearly state the reasons for using the new method, and (2) if possible, report results from both methods, to ideally show that the change in methods was not due to the results [ 57 ]. This information can either be provided in the manuscript or in an update to the original pre-registration (e.g., on the third-party registry such as the OSF), which can be useful when journal word limits are tight. Similarly, if researchers wish to include additional follow-up analyses to investigate an interesting or unexpected finding, this should be reported but labelled as “exploratory” or “post-hoc” in the manuscript.
Ensure a paper’s value does not depend on statistically significant results
Researchers may be concerned that reduced analytic flexibility from pre-registration could increase the likelihood of reporting null results [ 22 , 23 ], which are harder to publish [ 13 , 42 ]. To address this, we recommend taking steps to ensure that the value and success of a study does not depend on a significant p-value. First, methodologically strong research (e.g., with high statistical power, valid and reliable measures, robustness checks, and replication samples) will advance the field, whatever the findings. Second, methods can be applied to allow for the interpretation of statistically non-significant findings (e.g., Bayesian methods [ 58 ] or equivalence tests, which determine whether an observed effect is surprisingly small [ 52 , 59 , 60 ]. This means that the results will be informative whatever they show, in contrast to approaches relying solely on null hypothesis significance testing, where statistically non-significant findings cannot be interpreted as meaningful. Third, researchers can submit the proposed study as a Registered Report, where it will be evaluated before the results are available. This is arguably the strongest way to protect against publication bias, as in-principle study acceptance is granted without any knowledge of the results. In addition, Registered Reports can improve the methodology, as suggestions from expert reviewers can be incorporated into the pre-registered protocol.
Under a system that rewards novel and statistically significant findings, it is easy for subconscious human biases to lead to QRPs. However, researchers, along with data guardians, journals, funders, and institutions, have a responsibility to ensure that findings are reproducible and robust. While pre-registration can help to limit analytic flexibility and selective reporting, it involves several challenges for epidemiologists conducting secondary data analysis. The approaches described here aim to address these challenges (Fig. 1 ), to either improve the efficacy of pre-registration or provide an alternative approach to address analytic flexibility (e.g., a multiverse analysis). The responsibility in adopting these approaches should not only fall on researchers’ shoulders; data guardians also have an important role to play in recording and reporting access to data, providing blinded datasets and hold-out samples, and encouraging researchers to pre-register and adopt these solutions as part of their data request. Furthermore, wider stakeholders could incentivise these practices; for example, journals could provide a designated space for researchers to report deviations from the pre-registration, and funders could provide grants to establish best practice at the cohort level (e.g., data checkout systems, blinded datasets). Ease of adoption is key to ensure wide uptake, and we therefore encourage efforts to evaluate, simplify and improve these practices. Steps that could be taken to evaluate these practices are presented in Box 1.
More broadly, it is important to emphasise that researcher biases do not operate in isolation, but rather in the context of wider publication bias and a “publish or perish” culture. These incentive structures not only promote QRPs [ 61 ], but also discourage researchers from pre-registering and adopting other time-consuming reproducible methods. Therefore, in addition to targeting bias at the individual researcher level, wider initiatives from journals, funders, and institutions are required to address these institutional biases [ 7 ]. Systemic changes that reward rigorous and reproducible research will help researchers to provide unbiased answers to science and society’s most important questions.
Box 1. Evaluation of approaches
To evaluate, simplify and improve approaches to protect against researcher bias in secondary data analysis, the following steps could be taken.
Co-creation workshops to refine approaches
To obtain feedback on the approaches (including on any practical concerns or feasibility issues) co-creation workshops could be held with researchers, data managers, and wider stakeholders (e.g., journals, funders, and institutions).
Empirical research to evaluate efficacy of approaches
To evaluate the effectiveness of the approaches in preventing researcher bias and/or improving pre-registration, empirical research is needed. For example, to test the extent to which the multiverse analysis can reduce selective reporting, comparisons could be made between effect sizes from multiverse analyses versus effect sizes from meta-analyses (of non-pre-registered studies) addressing the same research question. If smaller effect sizes were found in multiverse analyses, it would suggest that the multiverse approach can reduce selective reporting. In addition, to test whether providing a blinded dataset or dataset missing outcome variables could help researchers develop an appropriate analytical protocol, researchers could be randomly assigned to receive such a dataset (or no dataset), prior to pre-registration. If researchers who received such a dataset had fewer eventual deviations from the pre-registered protocol (in the final study), it would suggest that this approach can help ensure that proposed analyses are appropriate for the data.
Pilot implementation of the measures
To assess the practical feasibility of the approaches, data managers could pilot measures for users of the dataset (e.g., required pre-registration for access to data, provision of datasets that are blinded or missing outcome variables). Feedback could then be collected from researchers and data managers via about the experience and ease of use.
Acknowledgements
The authors are grateful to Professor George Davey for his helpful comments on this article.
Author contributions
JRB and MRM developed the idea for the article. The first draft of the manuscript was written by JRB, with support from MRM and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
J.R.B is funded by a Wellcome Trust Sir Henry Wellcome fellowship (grant 215917/Z/19/Z). J.B.P is a supported by the Medical Research Foundation 2018 Emerging Leaders 1 st Prize in Adolescent Mental Health (MRF-160–0002-ELP-PINGA). M.R.M and H.M.S work in a unit that receives funding from the University of Bristol and the UK Medical Research Council (MC_UU_00011/5, MC_UU_00011/7), and M.R.M is also supported by the National Institute for Health Research (NIHR) Biomedical Research Centre at the University Hospitals Bristol National Health Service Foundation Trust and the University of Bristol.
Declarations
Author declares that they have no conflict of interest.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
IMAGES
VIDEO
COMMENTS
Abstract. This narrative review provides an overview on the topic of bias as part of Plastic and Reconstructive Surgery 's series of articles on evidence-based medicine. Bias can occur in the planning, data collection, analysis, and publication phases of research. Understanding research bias allows readers to critically and independently review ...
Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making. Example: Confirmation bias in research.
There are numerous sources of bias within the research process, ranging from the design and planning stage, data collection and analysis, interpretation of results, and the publication process. Bias in one or multiple points of this process can skew results and even lead to incorrect conclusions.
Introduction. Bias, perhaps best described as 'any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth,' can pollute the entire spectrum of research, including its design, analysis, interpretation and reporting. 1 It can taint entire bodies of research as much as it can individual studies. 2 3 Given this extensive ...
Bias—commonly understood to be any influence that provides a distortion in the results of a study (Polit & Beck, 2014)—is a term drawn from the quantitative research paradigm.Most (though perhaps not all) of us would recognize the concept as being incompatible with the philosophical underpinnings of qualitative inquiry (Thorne, Stephens, & Truant, 2016).
Research bias refers to any instance where the researcher, or the research design, negatively influences the quality of a study's results, whether intentionally or not. The three common types of research bias we looked at are: Selection bias - where a skewed sample leads to skewed results. Analysis bias - where the analysis method and/or ...
Research bias is the tendency for qualitative and quantitative research studies to contain prejudice or preference for or against a particular group of people, culture, object, idea, belief, or circumstance. Bias is rarely based on observed facts. In most cases, it results from societal stereotypes, systemic discrimination, or learned prejudice ...
Research bias can affect the validity and credibility of research findings, leading to erroneous conclusions. It can emerge from the researcher's subconscious preferences or the methodological design of the study itself. For instance, if a researcher unconsciously favors a particular outcome of the study, this preference could affect how they interpret the results, leading to a type of bias ...
Research showing negative results may be equally important to the contribution of knowledge in the field but may be less likely to be published. Conflict of Interest. Bias in research may occur when researchers have a conflict of interest, a person interest that conflicts with their professional obligation. Researchers should always be ...
What is bias in relation to research and why is understanding bias important? Bias is defined by the Oxford Dictionary as: "an inclination or prejudice for or against one person or group, especially in a way considered to be unfair"; "a concentration on an interest in one particular area or subject"; "a systematic distortion of statistical results due to a factor not allowed for in ...
The subject of this column is the nature of bias in both quantitative and qualitative research. To that end, bias will be defined and then both the processes by which it enters into research will be entertained along with discussions on how to ameliorate this problem.
Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect ...
Research bias: What it is, Types & Examples. The researcher sometimes unintentionally or actively affects the process while executing a systematic inquiry. It is known as research bias, and it can affect your results just like any other sort of bias. When it comes to studying bias, there are no hard and fast guidelines, which simply means that ...
Bias in research Joanna Smith,1 Helen Noble2 The aim of this article is to outline types of 'bias' across research designs, and consider strategies to minimise ... which may bias the findings towards more favourable results. Confounding bias can also occur because of an association between 'cause' and 'effect'.
Moderators must keep the engagement conversational and continue to vary question wording to minimize habituation. 4. Sponsor bias3: When respondents know - or suspect - the sponsor of the research, their feelings and opinions about that sponsor may bias their answers. Respondents' views on the sponsoring organization's mission or core ...
Researcher bias occurs when the researcher conducting the study is in favor of a certain result. Researchers can influence outcomes through their study design choices, including who they choose to ...
While understanding sources of bias is a key element for drawing valid conclusions, bias in health research continues to be a very sensitive issue that can affect the focus and outcome of investigations. Information bias, otherwise known as misclassification, is one of the most common sources of bias that affects the validity of health research.
Bias in research can cause distorted results and wrong conclusions. Such studies can lead to unnecessary costs, wrong clinical practice and they can eventually cause some kind of harm to the patient. It is therefore the responsibility of all involved stakeholders in the scientific publishing to ensure that only valid and unbiased research ...
Observer bias happens when a researcher's expectations, opinions, or prejudices influence what they perceive or record in a study. It often affects studies where observers are aware of the research aims and hypotheses. Observer bias is also called detection bias. Observer bias is particularly likely to occur in observational studies.
Revised on March 17, 2023. Sampling bias occurs when some members of a population are systematically more likely to be selected in a sample than others. It is also called ascertainment bias in medical fields. Sampling bias limits the generalizability of findings because it is a threat to external validity, specifically population validity.
Abstract. A systematic distortion of the relationship between a treatment, risk factor or exposure and clinical outcomes is denoted by the term 'bias'. Three types of bias can be distinguished: information bias, selection bias, and confounding. These three types of bias and their potential solutions are discussed using various examples.
The results revealed that, across a sample of UK adults (N = 987, ages 18-86, M = 45.21), the short scales had a reasonably good fit. ... These short scales may contribute in meaningful ways to research where the brevity of questionnaire-type measures may make a difference by contributing to data quality. ... a measure of weight bias ...
Previous research has shown that cues that are good predictors of relevant outcomes receive more attention than nonpredictive cues. This attentional bias is thought to stem from the different predictive value of cues. However, because successful performance requires more attention to predictive cues, the bias may be a lingering effect of previous attention to cues (i.e., a selection history ...
The inability to correctly account for unmeasured confounding can lead to bias in parameter estimates, invalid uncertainty assessments, and erroneous conclusions. Sensitivity analysis is an approach to investigate the impact of unmeasured confounding in observational studies. However, the adoption of this approach has been slow given the lack of accessible software. An extensive review of ...
Confirmation bias is the tendency to seek out information in a way that supports our existing beliefs while also rejecting any information that contradicts those beliefs. Confirmation bias is often unintentional but still results in skewed results and poor decision-making. Example: Confirmation bias in research.
The problem of selective reporting and research on reporting bias. Selective reporting of research findings presents a large-scale problem in science, substantially affecting the validity of the published body of knowledge ( Bouter et al., 2016; Dwan et al., 2014; van den Bogert et al., 2017).Reporting bias (publication bias or outcome reporting bias) occurs when the decision to report depends ...
To mitigate the systematic bias resulting from differences in satellite altitudes, the vertical ranges of various VTECs are mapped to a standardized height. The results indicate that the intermission bias of SA-derived VTEC remains relatively stable, with Jason-1 serving as a benchmark for mapping other datasets.
In addition, hindsight bias (the tendency to view past events as predictable) can lead to HARK-ing, so that observed results appear more compelling. The scope for these biases to distort research outputs from secondary data analysis is perhaps particularly acute, for two reasons.