analysis in research sample

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA) and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

What (exactly) is quantitative data analysis?
When to use quantitative analysis
How quantitative analysis works

The two “branches” of quantitative analysis

Descriptive statistics 101
Inferential statistics 101
How to choose the right quantitative methods
Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

Mean – this is simply the mathematical average of a range of numbers.
Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
Mode – this is simply the most commonly occurring number in the data set.
In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations.

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations, so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

Quantitative data analysis is all about analysing number-based data (which includes categorical and numerical data) using various statistical techniques.
The two main branches of statistics are descriptive statistics and inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
Common descriptive statistical methods include mean (average), median , standard deviation and skewness .
Common inferential statistical methods include t-tests , ANOVA , correlation and regression analysis.
To choose the right statistical methods and techniques, you need to consider the type of data you’re working with , as well as your research questions and hypotheses.

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project.

You Might Also Like:

74 Comments

Hi, I have read your article. Such a brilliant post you have created.

Thank you for the feedback. Good luck with your quantitative analysis.

Thank you so much.

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Amazing and simple way of breaking down quantitative methods.

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Send me every new information you might have.

i need every new information

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

I like your page..helpful

wonderful i got my concept crystal clear. thankyou!!

This is really helpful , thank you

Thank you so much this helped

Wonderfully explained

thank u so much, it was so informative

THANKYOU, this was very informative and very helpful

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

This is so great and fully useful. I would like to thank you again and again.

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

This is a very helpful article, couldn’t have been clearer. Thank you.

Awesome and phenomenal information.Well done

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

thank you so much, your presentation helped me a lot

I don’t know how should I express that ur article is saviour for me 🥺😍

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Very helpful and clear .Thank you Gradcoach.

Thank for sharing this article, well organized and information presented are very clear.

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

simple and constant direction to research. thanks

This is helpful

Great writing!! Comprehensive and very helpful.

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Thank you so much for such useful article!

Amazing article. So nicely explained. Wow

Very insightfull. Thanks

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

tnx. fruitful blog!

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

This was quite helpful. Thank you so much.

wow I got a lot from this article, thank you very much, keep it up

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Thank you very much, this service is very helpful.

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

I really enjoyed reading though this. Very easy to follow. Thank you

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Thank you very much for sharing, I got much from this article

This is a very informative write-up. Kindly include me in your latest posts.

Very interesting mostly for social scientists

Thank you so much, very helpfull

You’re welcome 🙂

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

very informative article. Easy to understand

Beautiful read, much needed.

Always greet intro and summary. I learn so much from GradCoach

Quite informative. Simple and clear summary.

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Absolutely!!! Thank you

Thank you very much for this post. It made me to understand how to do my data analysis.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Print Friendly

Online Degree Explore Bachelor’s & Master’s degrees
MasterTrack™ Earn credit towards a Master’s degree
University Certificates Advance your career with graduate-level learning
Top Courses
Join for Free

What Is Data Analysis? (With Examples)

Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions.

[Featured image] A female data analyst takes notes on her laptop at a standing desk in a modern office space

"It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock Holme's proclaims in Sir Arthur Conan Doyle's A Scandal in Bohemia.

This idea lies at the root of data analysis. When we can extract meaning from data, it empowers us to make better decisions. And we’re living in a time when we have more data than ever at our fingertips.

Companies are wisening up to the benefits of leveraging data. Data analysis can help a bank to personalize customer interactions, a health care system to predict future health needs, or an entertainment company to create the next big streaming hit.

The World Economic Forum Future of Jobs Report 2023 listed data analysts and scientists as one of the most in-demand jobs, alongside AI and machine learning specialists and big data specialists [ 1 ]. In this article, you'll learn more about the data analysis process, different types of data analysis, and recommended courses to help you get started in this exciting field.

Read more: How to Become a Data Analyst (with or Without a Degree)

Beginner-friendly data analysis courses

Interested in building your knowledge of data analysis today? Consider enrolling in one of these popular courses on Coursera:

In Google's Foundations: Data, Data, Everywhere course, you'll explore key data analysis concepts, tools, and jobs.

In Duke University's Data Analysis and Visualization course, you'll learn how to identify key components for data analytics projects, explore data visualization, and find out how to create a compelling data story.

Data analysis process

As the data available to companies continues to grow both in amount and complexity, so too does the need for an effective and efficient process by which to harness the value of that data. The data analysis process typically moves through several iterative phases. Let’s take a closer look at each.

Identify the business question you’d like to answer. What problem is the company trying to solve? What do you need to measure, and how will you measure it?

Collect the raw data sets you’ll need to help you answer the identified question. Data collection might come from internal sources, like a company’s client relationship management (CRM) software, or from secondary sources, like government records or social media application programming interfaces (APIs).

Clean the data to prepare it for analysis. This often involves purging duplicate and anomalous data, reconciling inconsistencies, standardizing data structure and format, and dealing with white spaces and other syntax errors.

Analyze the data. By manipulating the data using various data analysis techniques and tools, you can begin to find trends, correlations, outliers, and variations that tell a story. During this stage, you might use data mining to discover patterns within databases or data visualization software to help transform data into an easy-to-understand graphical format.

Interpret the results of your analysis to see how well the data answered your original question. What recommendations can you make based on the data? What are the limitations to your conclusions?

Learn more about data analysis in this lecture by Kevin, Director of Data Analytics at Google, from Google's Data Analytics Professional Certificate :

Read more: What Does a Data Analyst Do? A Career Guide

Types of data analysis (with examples)

Data can be used to answer questions and support decisions in many different ways. To identify the best way to analyze your date, it can help to familiarize yourself with the four types of data analysis commonly used in the field.

In this section, we’ll take a look at each of these data analysis methods, along with an example of how each might be applied in the real world.

Descriptive analysis

Descriptive analysis tells us what happened. This type of analysis helps describe or summarize quantitative data by presenting statistics. For example, descriptive statistical analysis could show the distribution of sales across a group of employees and the average sales figure per employee.

Descriptive analysis answers the question, “what happened?”

Diagnostic analysis

If the descriptive analysis determines the “what,” diagnostic analysis determines the “why.” Let’s say a descriptive analysis shows an unusual influx of patients in a hospital. Drilling into the data further might reveal that many of these patients shared symptoms of a particular virus. This diagnostic analysis can help you determine that an infectious agent—the “why”—led to the influx of patients.

Diagnostic analysis answers the question, “why did it happen?”

Predictive analysis

So far, we’ve looked at types of analysis that examine and draw conclusions about the past. Predictive analytics uses data to form projections about the future. Using predictive analysis, you might notice that a given product has had its best sales during the months of September and October each year, leading you to predict a similar high point during the upcoming year.

Predictive analysis answers the question, “what might happen in the future?”

Prescriptive analysis

Prescriptive analysis takes all the insights gathered from the first three types of analysis and uses them to form recommendations for how a company should act. Using our previous example, this type of analysis might suggest a market plan to build on the success of the high sales months and harness new growth opportunities in the slower months.

Prescriptive analysis answers the question, “what should we do about it?”

This last type is where the concept of data-driven decision-making comes into play.

Read more : Advanced Analytics: Definition, Benefits, and Use Cases

What is data-driven decision-making (DDDM)?

Data-driven decision-making, sometimes abbreviated to DDDM), can be defined as the process of making strategic business decisions based on facts, data, and metrics instead of intuition, emotion, or observation.

This might sound obvious, but in practice, not all organizations are as data-driven as they could be. According to global management consulting firm McKinsey Global Institute, data-driven companies are better at acquiring new customers, maintaining customer loyalty, and achieving above-average profitability [ 2 ].

Get started with Coursera

If you’re interested in a career in the high-growth field of data analytics, consider these top-rated courses on Coursera:

Begin building job-ready skills with the Google Data Analytics Professional Certificate . Prepare for an entry-level job as you learn from Google employees—no experience or degree required.

Practice working with data with Macquarie University's Excel Skills for Business Specialization . Learn how to use Microsoft Excel to analyze data and make data-informed business decisions.

Deepen your skill set with Google's Advanced Data Analytics Professional Certificate . In this advanced program, you'll continue exploring the concepts introduced in the beginner-level courses, plus learn Python, statistics, and Machine Learning concepts.

Frequently asked questions (FAQ)

Where is data analytics used ‎.

Just about any business or organization can use data analytics to help inform their decisions and boost their performance. Some of the most successful companies across a range of industries — from Amazon and Netflix to Starbucks and General Electric — integrate data into their business plans to improve their overall business performance. ‎

What are the top skills for a data analyst? ‎

Data analysis makes use of a range of analysis tools and technologies. Some of the top skills for data analysts include SQL, data visualization, statistical programming languages (like R and Python), machine learning, and spreadsheets.

Read : 7 In-Demand Data Analyst Skills to Get Hired in 2022 ‎

What is a data analyst job salary? ‎

Data from Glassdoor indicates that the average base salary for a data analyst in the United States is $75,349 as of March 2024 [ 3 ]. How much you make will depend on factors like your qualifications, experience, and location. ‎

Do data analysts need to be good at math? ‎

Data analytics tends to be less math-intensive than data science. While you probably won’t need to master any advanced mathematics, a foundation in basic math and statistical analysis can help set you up for success.

Learn more: Data Analyst vs. Data Scientist: What’s the Difference? ‎

Article sources

World Economic Forum. " The Future of Jobs Report 2023 , https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf." Accessed March 19, 2024.

McKinsey & Company. " Five facts: How customer analytics boosts corporate performance , https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/five-facts-how-customer-analytics-boosts-corporate-performance." Accessed March 19, 2024.

Glassdoor. " Data Analyst Salaries , https://www.glassdoor.com/Salaries/data-analyst-salary-SRCH_KO0,12.htm" Accessed March 19, 2024.

Keep reading

Coursera staff.

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Read the latest news stories about Mailman faculty, research, and events.

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health.

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health.

Content Analysis

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts. As an example, researchers can evaluate language used within a news article to search for bias or partiality. Researchers can then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of surrounding the text.

Description

Sources of data could be from interviews, open-ended questions, field research notes, conversations, or literally any occurrence of communicative language (such as books, essays, discussions, newspaper headlines, speeches, media, historical documents). A single study may analyze various forms of text in its analysis. To analyze the text using content analysis, the text must be coded, or broken down, into manageable code categories for analysis (i.e. “codes”). Once the text is coded into code categories, the codes can then be further categorized into “code categories” to summarize data even further.

Three different definitions of content analysis are provided below.

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics of messages.” (from Holsti, 1968)

Definition 2: “An interpretive and naturalistic approach. It is both observational and narrative in nature and relies less on the experimental elements normally associated with scientific research (reliability, validity, and generalizability) (from Ethnography, Observational Research, and Narrative Inquiry, 1994-2012).

Definition 3: “A research technique for the objective, systematic and quantitative description of the manifest content of communication.” (from Berelson, 1952)

Uses of Content Analysis

Identify the intentions, focus or communication trends of an individual, group or institution

Describe attitudinal and behavioral responses to communications

Determine the psychological or emotional state of persons or groups

Reveal international differences in communication content

Reveal patterns in communication content

Pre-test and improve an intervention or survey prior to launch

Analyze focus group interviews and open-ended questions to complement quantitative data

Types of Content Analysis

There are two general types of content analysis: conceptual analysis and relational analysis. Conceptual analysis determines the existence and frequency of concepts in a text. Relational analysis develops the conceptual analysis further by examining the relationships among concepts in a text. Each type of analysis may lead to different results, conclusions, interpretations and meanings.

Conceptual Analysis

Typically people think of conceptual analysis when they think of content analysis. In conceptual analysis, a concept is chosen for examination and the analysis involves quantifying and counting its presence. The main goal is to examine the occurrence of selected terms in the data. Terms may be explicit or implicit. Explicit terms are easy to identify. Coding of implicit terms is more complicated: you need to decide the level of implication and base judgments on subjectivity (an issue for reliability and validity). Therefore, coding of implicit terms involves using a dictionary or contextual translation rules or both.

To begin a conceptual content analysis, first identify the research question and choose a sample or samples for analysis. Next, the text must be coded into manageable content categories. This is basically a process of selective reduction. By reducing the text to categories, the researcher can focus on and code for specific words or patterns that inform the research question.

General steps for conducting a conceptual content analysis:

1. Decide the level of analysis: word, word sense, phrase, sentence, themes

2. Decide how many concepts to code for: develop a pre-defined or interactive set of categories or concepts. Decide either: A. to allow flexibility to add categories through the coding process, or B. to stick with the pre-defined set of categories.

Option A allows for the introduction and analysis of new and important material that could have significant implications to one’s research question.

Option B allows the researcher to stay focused and examine the data for specific concepts.

3. Decide whether to code for existence or frequency of a concept. The decision changes the coding process.

When coding for the existence of a concept, the researcher would count a concept only once if it appeared at least once in the data and no matter how many times it appeared.

When coding for the frequency of a concept, the researcher would count the number of times a concept appears in a text.

4. Decide on how you will distinguish among concepts:

Should text be coded exactly as they appear or coded as the same when they appear in different forms? For example, “dangerous” vs. “dangerousness”. The point here is to create coding rules so that these word segments are transparently categorized in a logical fashion. The rules could make all of these word segments fall into the same category, or perhaps the rules can be formulated so that the researcher can distinguish these word segments into separate codes.

What level of implication is to be allowed? Words that imply the concept or words that explicitly state the concept? For example, “dangerous” vs. “the person is scary” vs. “that person could cause harm to me”. These word segments may not merit separate categories, due the implicit meaning of “dangerous”.

5. Develop rules for coding your texts. After decisions of steps 1-4 are complete, a researcher can begin developing rules for translation of text into codes. This will keep the coding process organized and consistent. The researcher can code for exactly what he/she wants to code. Validity of the coding process is ensured when the researcher is consistent and coherent in their codes, meaning that they follow their translation rules. In content analysis, obeying by the translation rules is equivalent to validity.

6. Decide what to do with irrelevant information: should this be ignored (e.g. common English words like “the” and “and”), or used to reexamine the coding scheme in the case that it would add to the outcome of coding?

7. Code the text: This can be done by hand or by using software. By using software, researchers can input categories and have coding done automatically, quickly and efficiently, by the software program. When coding is done by hand, a researcher can recognize errors far more easily (e.g. typos, misspelling). If using computer coding, text could be cleaned of errors to include all available data. This decision of hand vs. computer coding is most relevant for implicit information where category preparation is essential for accurate coding.

8. Analyze your results: Draw conclusions and generalizations where possible. Determine what to do with irrelevant, unwanted, or unused text: reexamine, ignore, or reassess the coding scheme. Interpret results carefully as conceptual content analysis can only quantify the information. Typically, general trends and patterns can be identified.

Relational Analysis

Relational analysis begins like conceptual analysis, where a concept is chosen for examination. However, the analysis involves exploring the relationships between concepts. Individual concepts are viewed as having no inherent meaning and rather the meaning is a product of the relationships among concepts.

To begin a relational content analysis, first identify a research question and choose a sample or samples for analysis. The research question must be focused so the concept types are not open to interpretation and can be summarized. Next, select text for analysis. Select text for analysis carefully by balancing having enough information for a thorough analysis so results are not limited with having information that is too extensive so that the coding process becomes too arduous and heavy to supply meaningful and worthwhile results.

There are three subcategories of relational analysis to choose from prior to going on to the general steps.

Affect extraction: an emotional evaluation of concepts explicit in a text. A challenge to this method is that emotions can vary across time, populations, and space. However, it could be effective at capturing the emotional and psychological state of the speaker or writer of the text.

Proximity analysis: an evaluation of the co-occurrence of explicit concepts in the text. Text is defined as a string of words called a “window” that is scanned for the co-occurrence of concepts. The result is the creation of a “concept matrix”, or a group of interrelated co-occurring concepts that would suggest an overall meaning.

Cognitive mapping: a visualization technique for either affect extraction or proximity analysis. Cognitive mapping attempts to create a model of the overall meaning of the text such as a graphic map that represents the relationships between concepts.

General steps for conducting a relational content analysis:

1. Determine the type of analysis: Once the sample has been selected, the researcher needs to determine what types of relationships to examine and the level of analysis: word, word sense, phrase, sentence, themes. 2. Reduce the text to categories and code for words or patterns. A researcher can code for existence of meanings or words. 3. Explore the relationship between concepts: once the words are coded, the text can be analyzed for the following:

Strength of relationship: degree to which two or more concepts are related.

Sign of relationship: are concepts positively or negatively related to each other?

Direction of relationship: the types of relationship that categories exhibit. For example, “X implies Y” or “X occurs before Y” or “if X then Y” or if X is the primary motivator of Y.

4. Code the relationships: a difference between conceptual and relational analysis is that the statements or relationships between concepts are coded. 5. Perform statistical analyses: explore differences or look for relationships among the identified variables during coding. 6. Map out representations: such as decision mapping and mental models.

Reliability and Validity

Reliability : Because of the human nature of researchers, coding errors can never be eliminated but only minimized. Generally, 80% is an acceptable margin for reliability. Three criteria comprise the reliability of a content analysis:

Stability: the tendency for coders to consistently re-code the same data in the same way over a period of time.

Reproducibility: tendency for a group of coders to classify categories membership in the same way.

Accuracy: extent to which the classification of text corresponds to a standard or norm statistically.

Validity : Three criteria comprise the validity of a content analysis:

Closeness of categories: this can be achieved by utilizing multiple classifiers to arrive at an agreed upon definition of each specific category. Using multiple classifiers, a concept category that may be an explicit variable can be broadened to include synonyms or implicit variables.

Conclusions: What level of implication is allowable? Do conclusions correctly follow the data? Are results explainable by other phenomena? This becomes especially problematic when using computer software for analysis and distinguishing between synonyms. For example, the word “mine,” variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. Software can obtain an accurate count of that word’s occurrence and frequency, but not be able to produce an accurate accounting of the meaning inherent in each particular usage. This problem could throw off one’s results and make any conclusion invalid.

Generalizability of the results to a theory: dependent on the clear definitions of concept categories, how they are determined and how reliable they are at measuring the idea one is seeking to measure. Generalizability parallels reliability as much of it depends on the three criteria for reliability.

Advantages of Content Analysis

Directly examines communication using text

Allows for both qualitative and quantitative analysis

Provides valuable historical and cultural insights over time

Allows a closeness to data

Coded form of the text can be statistically analyzed

Unobtrusive means of analyzing interactions

Provides insight into complex models of human thought and language use

When done well, is considered a relatively “exact” research method

Content analysis is a readily-understood and an inexpensive research method

A more powerful tool when combined with other research methods such as interviews, observation, and use of archival records. It is very useful for analyzing historical material, especially for documenting trends over time.

Disadvantages of Content Analysis

Can be extremely time consuming

Is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation

Is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study

Is inherently reductive, particularly when dealing with complex texts

Tends too often to simply consist of word counts

Often disregards the context that produced the text, as well as the state of things after the text is produced

Can be difficult to automate or computerize

Textbooks & Chapters

Berelson, Bernard. Content Analysis in Communication Research.New York: Free Press, 1952.

Busha, Charles H. and Stephen P. Harter. Research Methods in Librarianship: Techniques and Interpretation.New York: Academic Press, 1980.

de Sola Pool, Ithiel. Trends in Content Analysis. Urbana: University of Illinois Press, 1959.

Krippendorff, Klaus. Content Analysis: An Introduction to its Methodology. Beverly Hills: Sage Publications, 1980.

Fielding, NG & Lee, RM. Using Computers in Qualitative Research. SAGE Publications, 1991. (Refer to Chapter by Seidel, J. ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’.)

Methodological Articles

Hsieh HF & Shannon SE. (2005). Three Approaches to Qualitative Content Analysis.Qualitative Health Research. 15(9): 1277-1288.

Elo S, Kaarianinen M, Kanste O, Polkki R, Utriainen K, & Kyngas H. (2014). Qualitative Content Analysis: A focus on trustworthiness. Sage Open. 4:1-10.

Application Articles

Abroms LC, Padmanabhan N, Thaweethai L, & Phillips T. (2011). iPhone Apps for Smoking Cessation: A content analysis. American Journal of Preventive Medicine. 40(3):279-285.

Ullstrom S. Sachs MA, Hansson J, Ovretveit J, & Brommels M. (2014). Suffering in Silence: a qualitative study of second victims of adverse events. British Medical Journal, Quality & Safety Issue. 23:325-331.

Owen P. (2012).Portrayals of Schizophrenia by Entertainment Media: A Content Analysis of Contemporary Movies. Psychiatric Services. 63:655-659.

Choosing whether to conduct a content analysis by hand or by using computer software can be difficult. Refer to ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’ listed above in “Textbooks and Chapters” for a discussion of the issue.

QSR NVivo: http://www.qsrinternational.com/products.aspx

Atlas.ti: http://www.atlasti.com/webinars.html

R- RQDA package: http://rqda.r-forge.r-project.org/

Rolly Constable, Marla Cowell, Sarita Zornek Crawford, David Golden, Jake Hartvigsen, Kathryn Morgan, Anne Mudgett, Kris Parrish, Laura Thomas, Erika Yolanda Thompson, Rosie Turner, and Mike Palmquist. (1994-2012). Ethnography, Observational Research, and Narrative Inquiry. Writing@CSU. Colorado State University. Available at: https://writing.colostate.edu/guides/guide.cfm?guideid=63 .

As an introduction to Content Analysis by Michael Palmquist, this is the main resource on Content Analysis on the Web. It is comprehensive, yet succinct. It includes examples and an annotated bibliography. The information contained in the narrative above draws heavily from and summarizes Michael Palmquist’s excellent resource on Content Analysis but was streamlined for the purpose of doctoral students and junior researchers in epidemiology.

At Columbia University Mailman School of Public Health, more detailed training is available through the Department of Sociomedical Sciences- P8785 Qualitative Research Methods.

Join the Conversation

Have a question about methods? Join us on Facebook

How to conduct a meta-analysis in eight steps: a practical guide

Open access
Published: 30 November 2021
Volume 72 , pages 1–19, ( 2022 )

Cite this article

You have full access to this open access article

Christopher Hansen 1 ,
Holger Steinmetz 2 &
Jörn Block 3 , 4 , 5

142k Accesses

44 Citations

157 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

1 Introduction

“Scientists have known for centuries that a single study will not resolve a major issue. Indeed, a small sample study will not even resolve a minor issue. Thus, the foundation of science is the cumulation of knowledge from the results of many studies.” (Hunter et al. 1982 , p. 10)

Meta-analysis is a central method for knowledge accumulation in many scientific fields (Aguinis et al. 2011c ; Kepes et al. 2013 ). Similar to a narrative review, it serves as a synopsis of a research question or field. However, going beyond a narrative summary of key findings, a meta-analysis adds value in providing a quantitative assessment of the relationship between two target variables or the effectiveness of an intervention (Gurevitch et al. 2018 ). Also, it can be used to test competing theoretical assumptions against each other or to identify important moderators where the results of different primary studies differ from each other (Aguinis et al. 2011b ; Bergh et al. 2016 ). Rooted in the synthesis of the effectiveness of medical and psychological interventions in the 1970s (Glass 2015 ; Gurevitch et al. 2018 ), meta-analysis is nowadays also an established method in management research and related fields.

The increasing importance of meta-analysis in management research has resulted in the publication of guidelines in recent years that discuss the merits and best practices in various fields, such as general management (Bergh et al. 2016 ; Combs et al. 2019 ; Gonzalez-Mulé and Aguinis 2018 ), international business (Steel et al. 2021 ), economics and finance (Geyer-Klingeberg et al. 2020 ; Havranek et al. 2020 ), marketing (Eisend 2017 ; Grewal et al. 2018 ), and organizational studies (DeSimone et al. 2020 ; Rudolph et al. 2020 ). These articles discuss existing and trending methods and propose solutions for often experienced problems. This editorial briefly summarizes the insights of these papers; provides a workflow of the essential steps in conducting a meta-analysis; suggests state-of-the art methodological procedures; and points to other articles for in-depth investigation. Thus, this article has two goals: (1) based on the findings of previous editorials and methodological articles, it defines methodological recommendations for meta-analyses submitted to Management Review Quarterly (MRQ); and (2) it serves as a practical guide for researchers who have little experience with meta-analysis as a method but plan to conduct one in the future.

2 Eight steps in conducting a meta-analysis

2.1 step 1: defining the research question.

The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed. When defining the research question, two hurdles might develop. First, when defining an adequate study scope, researchers must consider that the number of publications has grown exponentially in many fields of research in recent decades (Fortunato et al. 2018 ). On the one hand, a larger number of studies increases the potentially relevant literature basis and enables researchers to conduct meta-analyses. Conversely, scanning a large amount of studies that could be potentially relevant for the meta-analysis results in a perhaps unmanageable workload. Thus, Steel et al. ( 2021 ) highlight the importance of balancing manageability and relevance when defining the research question. Second, similar to the number of primary studies also the number of meta-analyses in management research has grown strongly in recent years (Geyer-Klingeberg et al. 2020 ; Rauch 2020 ; Schwab 2015 ). Therefore, it is likely that one or several meta-analyses for many topics of high scholarly interest already exist. However, this should not deter researchers from investigating their research questions. One possibility is to consider moderators or mediators of a relationship that have previously been ignored. For example, a meta-analysis about startup performance could investigate the impact of different ways to measure the performance construct (e.g., growth vs. profitability vs. survival time) or certain characteristics of the founders as moderators. Another possibility is to replicate previous meta-analyses and test whether their findings can be confirmed with an updated sample of primary studies or newly developed methods. Frequent replications and updates of meta-analyses are important contributions to cumulative science and are increasingly called for by the research community (Anderson & Kichkha 2017 ; Steel et al. 2021 ). Consistent with its focus on replication studies (Block and Kuckertz 2018 ), MRQ therefore also invites authors to submit replication meta-analyses.

2.2 Step 2: literature search

2.2.1 search strategies.

Similar to conducting a literature review, the search process of a meta-analysis should be systematic, reproducible, and transparent, resulting in a sample that includes all relevant studies (Fisch and Block 2018 ; Gusenbauer and Haddaway 2020 ). There are several identification strategies for relevant primary studies when compiling meta-analytical datasets (Harari et al. 2020 ). First, previous meta-analyses on the same or a related topic may provide lists of included studies that offer a good starting point to identify and become familiar with the relevant literature. This practice is also applicable to topic-related literature reviews, which often summarize the central findings of the reviewed articles in systematic tables. Both article types likely include the most prominent studies of a research field. The most common and important search strategy, however, is a keyword search in electronic databases (Harari et al. 2020 ). This strategy will probably yield the largest number of relevant studies, particularly so-called ‘grey literature’, which may not be considered by literature reviews. Gusenbauer and Haddaway ( 2020 ) provide a detailed overview of 34 scientific databases, of which 18 are multidisciplinary or have a focus on management sciences, along with their suitability for literature synthesis. To prevent biased results due to the scope or journal coverage of one database, researchers should use at least two different databases (DeSimone et al. 2020 ; Martín-Martín et al. 2021 ; Mongeon & Paul-Hus 2016 ). However, a database search can easily lead to an overload of potentially relevant studies. For example, key term searches in Google Scholar for “entrepreneurial intention” and “firm diversification” resulted in more than 660,000 and 810,000 hits, respectively. Footnote 1 Therefore, a precise research question and precise search terms using Boolean operators are advisable (Gusenbauer and Haddaway 2020 ). Addressing the challenge of identifying relevant articles in the growing number of database publications, (semi)automated approaches using text mining and machine learning (Bosco et al. 2017 ; O’Mara-Eves et al. 2015 ; Ouzzani et al. 2016 ; Thomas et al. 2017 ) can also be promising and time-saving search tools in the future. Also, some electronic databases offer the possibility to track forward citations of influential studies and thereby identify further relevant articles. Finally, collecting unpublished or undetected studies through conferences, personal contact with (leading) scholars, or listservs can be strategies to increase the study sample size (Grewal et al. 2018 ; Harari et al. 2020 ; Pigott and Polanin 2020 ).

2.2.2 Study inclusion criteria and sample composition

Next, researchers must decide which studies to include in the meta-analysis. Some guidelines for literature reviews recommend limiting the sample to studies published in renowned academic journals to ensure the quality of findings (e.g., Kraus et al. 2020 ). For meta-analysis, however, Steel et al. ( 2021 ) advocate for the inclusion of all available studies, including grey literature, to prevent selection biases based on availability, cost, familiarity, and language (Rothstein et al. 2005 ), or the “Matthew effect”, which denotes the phenomenon that highly cited articles are found faster than less cited articles (Merton 1968 ). Harrison et al. ( 2017 ) find that the effects of published studies in management are inflated on average by 30% compared to unpublished studies. This so-called publication bias or “file drawer problem” (Rosenthal 1979 ) results from the preference of academia to publish more statistically significant and less statistically insignificant study results. Owen and Li ( 2020 ) showed that publication bias is particularly severe when variables of interest are used as key variables rather than control variables. To consider the true effect size of a target variable or relationship, the inclusion of all types of research outputs is therefore recommended (Polanin et al. 2016 ). Different test procedures to identify publication bias are discussed subsequently in Step 7.

In addition to the decision of whether to include certain study types (i.e., published vs. unpublished studies), there can be other reasons to exclude studies that are identified in the search process. These reasons can be manifold and are primarily related to the specific research question and methodological peculiarities. For example, studies identified by keyword search might not qualify thematically after all, may use unsuitable variable measurements, or may not report usable effect sizes. Furthermore, there might be multiple studies by the same authors using similar datasets. If they do not differ sufficiently in terms of their sample characteristics or variables used, only one of these studies should be included to prevent bias from duplicates (Wood 2008 ; see this article for a detection heuristic).

In general, the screening process should be conducted stepwise, beginning with a removal of duplicate citations from different databases, followed by abstract screening to exclude clearly unsuitable studies and a final full-text screening of the remaining articles (Pigott and Polanin 2020 ). A graphical tool to systematically document the sample selection process is the PRISMA flow diagram (Moher et al. 2009 ). Page et al. ( 2021 ) recently presented an updated version of the PRISMA statement, including an extended item checklist and flow diagram to report the study process and findings.

2.3 Step 3: choice of the effect size measure

2.3.1 types of effect sizes.

The two most common meta-analytical effect size measures in management studies are (z-transformed) correlation coefficients and standardized mean differences (Aguinis et al. 2011a ; Geyskens et al. 2009 ). However, meta-analyses in management science and related fields may not be limited to those two effect size measures but rather depend on the subfield of investigation (Borenstein 2009 ; Stanley and Doucouliagos 2012 ). In economics and finance, researchers are more interested in the examination of elasticities and marginal effects extracted from regression models than in pure bivariate correlations (Stanley and Doucouliagos 2012 ). Regression coefficients can also be converted to partial correlation coefficients based on their t-statistics to make regression results comparable across studies (Stanley and Doucouliagos 2012 ). Although some meta-analyses in management research have combined bivariate and partial correlations in their study samples, Aloe ( 2015 ) and Combs et al. ( 2019 ) advise researchers not to use this practice. Most importantly, they argue that the effect size strength of partial correlations depends on the other variables included in the regression model and is therefore incomparable to bivariate correlations (Schmidt and Hunter 2015 ), resulting in a possible bias of the meta-analytic results (Roth et al. 2018 ). We endorse this opinion. If at all, we recommend separate analyses for each measure. In addition to these measures, survival rates, risk ratios or odds ratios, which are common measures in medical research (Borenstein 2009 ), can be suitable effect sizes for specific management research questions, such as understanding the determinants of the survival of startup companies. To summarize, the choice of a suitable effect size is often taken away from the researcher because it is typically dependent on the investigated research question as well as the conventions of the specific research field (Cheung and Vijayakumar 2016 ).

2.3.2 Conversion of effect sizes to a common measure

After having defined the primary effect size measure for the meta-analysis, it might become necessary in the later coding process to convert study findings that are reported in effect sizes that are different from the chosen primary effect size. For example, a study might report only descriptive statistics for two study groups but no correlation coefficient, which is used as the primary effect size measure in the meta-analysis. Different effect size measures can be harmonized using conversion formulae, which are provided by standard method books such as Borenstein et al. ( 2009 ) or Lipsey and Wilson ( 2001 ). There also exist online effect size calculators for meta-analysis. Footnote 2

2.4 Step 4: choice of the analytical method used

Choosing which meta-analytical method to use is directly connected to the research question of the meta-analysis. Research questions in meta-analyses can address a relationship between constructs or an effect of an intervention in a general manner, or they can focus on moderating or mediating effects. There are four meta-analytical methods that are primarily used in contemporary management research (Combs et al. 2019 ; Geyer-Klingeberg et al. 2020 ), which allow the investigation of these different types of research questions: traditional univariate meta-analysis, meta-regression, meta-analytic structural equation modeling, and qualitative meta-analysis (Hoon 2013 ). While the first three are quantitative, the latter summarizes qualitative findings. Table 1 summarizes the key characteristics of the three quantitative methods.

2.4.1 Univariate meta-analysis

In its traditional form, a meta-analysis reports a weighted mean effect size for the relationship or intervention of investigation and provides information on the magnitude of variance among primary studies (Aguinis et al. 2011c ; Borenstein et al. 2009 ). Accordingly, it serves as a quantitative synthesis of a research field (Borenstein et al. 2009 ; Geyskens et al. 2009 ). Prominent traditional approaches have been developed, for example, by Hedges and Olkin ( 1985 ) or Hunter and Schmidt ( 1990 , 2004 ). However, going beyond its simple summary function, the traditional approach has limitations in explaining the observed variance among findings (Gonzalez-Mulé and Aguinis 2018 ). To identify moderators (or boundary conditions) of the relationship of interest, meta-analysts can create subgroups and investigate differences between those groups (Borenstein and Higgins 2013 ; Hunter and Schmidt 2004 ). Potential moderators can be study characteristics (e.g., whether a study is published vs. unpublished), sample characteristics (e.g., study country, industry focus, or type of survey/experiment participants), or measurement artifacts (e.g., different types of variable measurements). The univariate approach is thus suitable to identify the overall direction of a relationship and can serve as a good starting point for additional analyses. However, due to its limitations in examining boundary conditions and developing theory, the univariate approach on its own is currently oftentimes viewed as not sufficient (Rauch 2020 ; Shaw and Ertug 2017 ).

2.4.2 Meta-regression analysis

Meta-regression analysis (Hedges and Olkin 1985 ; Lipsey and Wilson 2001 ; Stanley and Jarrell 1989 ) aims to investigate the heterogeneity among observed effect sizes by testing multiple potential moderators simultaneously. In meta-regression, the coded effect size is used as the dependent variable and is regressed on a list of moderator variables. These moderator variables can be categorical variables as described previously in the traditional univariate approach or (semi)continuous variables such as country scores that are merged with the meta-analytical data. Thus, meta-regression analysis overcomes the disadvantages of the traditional approach, which only allows us to investigate moderators singularly using dichotomized subgroups (Combs et al. 2019 ; Gonzalez-Mulé and Aguinis 2018 ). These possibilities allow a more fine-grained analysis of research questions that are related to moderating effects. However, Schmidt ( 2017 ) critically notes that the number of effect sizes in the meta-analytical sample must be sufficiently large to produce reliable results when investigating multiple moderators simultaneously in a meta-regression. For further reading, Tipton et al. ( 2019 ) outline the technical, conceptual, and practical developments of meta-regression over the last decades. Gonzalez-Mulé and Aguinis ( 2018 ) provide an overview of methodological choices and develop evidence-based best practices for future meta-analyses in management using meta-regression.

2.4.3 Meta-analytic structural equation modeling (MASEM)

MASEM is a combination of meta-analysis and structural equation modeling and allows to simultaneously investigate the relationships among several constructs in a path model. Researchers can use MASEM to test several competing theoretical models against each other or to identify mediation mechanisms in a chain of relationships (Bergh et al. 2016 ). This method is typically performed in two steps (Cheung and Chan 2005 ): In Step 1, a pooled correlation matrix is derived, which includes the meta-analytical mean effect sizes for all variable combinations; Step 2 then uses this matrix to fit the path model. While MASEM was based primarily on traditional univariate meta-analysis to derive the pooled correlation matrix in its early years (Viswesvaran and Ones 1995 ), more advanced methods, such as the GLS approach (Becker 1992 , 1995 ) or the TSSEM approach (Cheung and Chan 2005 ), have been subsequently developed. Cheung ( 2015a ) and Jak ( 2015 ) provide an overview of these approaches in their books with exemplary code. For datasets with more complex data structures, Wilson et al. ( 2016 ) also developed a multilevel approach that is related to the TSSEM approach in the second step. Bergh et al. ( 2016 ) discuss nine decision points and develop best practices for MASEM studies.

2.4.4 Qualitative meta-analysis

While the approaches explained above focus on quantitative outcomes of empirical studies, qualitative meta-analysis aims to synthesize qualitative findings from case studies (Hoon 2013 ; Rauch et al. 2014 ). The distinctive feature of qualitative case studies is their potential to provide in-depth information about specific contextual factors or to shed light on reasons for certain phenomena that cannot usually be investigated by quantitative studies (Rauch 2020 ; Rauch et al. 2014 ). In a qualitative meta-analysis, the identified case studies are systematically coded in a meta-synthesis protocol, which is then used to identify influential variables or patterns and to derive a meta-causal network (Hoon 2013 ). Thus, the insights of contextualized and typically nongeneralizable single studies are aggregated to a larger, more generalizable picture (Habersang et al. 2019 ). Although still the exception, this method can thus provide important contributions for academics in terms of theory development (Combs et al., 2019 ; Hoon 2013 ) and for practitioners in terms of evidence-based management or entrepreneurship (Rauch et al. 2014 ). Levitt ( 2018 ) provides a guide and discusses conceptual issues for conducting qualitative meta-analysis in psychology, which is also useful for management researchers.

2.5 Step 5: choice of software

Software solutions to perform meta-analyses range from built-in functions or additional packages of statistical software to software purely focused on meta-analyses and from commercial to open-source solutions. However, in addition to personal preferences, the choice of the most suitable software depends on the complexity of the methods used and the dataset itself (Cheung and Vijayakumar 2016 ). Meta-analysts therefore must carefully check if their preferred software is capable of performing the intended analysis.

Among commercial software providers, Stata (from version 16 on) offers built-in functions to perform various meta-analytical analyses or to produce various plots (Palmer and Sterne 2016 ). For SPSS and SAS, there exist several macros for meta-analyses provided by scholars, such as David B. Wilson or Andy P. Field and Raphael Gillet (Field and Gillett 2010 ). Footnote 3 Footnote 4 For researchers using the open-source software R (R Core Team 2021 ), Polanin et al. ( 2017 ) provide an overview of 63 meta-analysis packages and their functionalities. For new users, they recommend the package metafor (Viechtbauer 2010 ), which includes most necessary functions and for which the author Wolfgang Viechtbauer provides tutorials on his project website. Footnote 5 Footnote 6 In addition to packages and macros for statistical software, templates for Microsoft Excel have also been developed to conduct simple meta-analyses, such as Meta-Essentials by Suurmond et al. ( 2017 ). Footnote 7 Finally, programs purely dedicated to meta-analysis also exist, such as Comprehensive Meta-Analysis (Borenstein et al. 2013 ) or RevMan by The Cochrane Collaboration ( 2020 ).

2.6 Step 6: coding of effect sizes

2.6.1 coding sheet.

The first step in the coding process is the design of the coding sheet. A universal template does not exist because the design of the coding sheet depends on the methods used, the respective software, and the complexity of the research design. For univariate meta-analysis or meta-regression, data are typically coded in wide format. In its simplest form, when investigating a correlational relationship between two variables using the univariate approach, the coding sheet would contain a column for the study name or identifier, the effect size coded from the primary study, and the study sample size. However, such simple relationships are unlikely in management research because the included studies are typically not identical but differ in several respects. With more complex data structures or moderator variables being investigated, additional columns are added to the coding sheet to reflect the data characteristics. These variables can be coded as dummy, factor, or (semi)continuous variables and later used to perform a subgroup analysis or meta regression. For MASEM, the required data input format can deviate depending on the method used (e.g., TSSEM requires a list of correlation matrices as data input). For qualitative meta-analysis, the coding scheme typically summarizes the key qualitative findings and important contextual and conceptual information (see Hoon ( 2013 ) for a coding scheme for qualitative meta-analysis). Figure 1 shows an exemplary coding scheme for a quantitative meta-analysis on the correlational relationship between top-management team diversity and profitability. In addition to effect and sample sizes, information about the study country, firm type, and variable operationalizations are coded. The list could be extended by further study and sample characteristics.

Exemplary coding sheet for a meta-analysis on the relationship (correlation) between top-management team diversity and profitability

2.6.2 Inclusion of moderator or control variables

It is generally important to consider the intended research model and relevant nontarget variables before coding a meta-analytic dataset. For example, study characteristics can be important moderators or function as control variables in a meta-regression model. Similarly, control variables may be relevant in a MASEM approach to reduce confounding bias. Coding additional variables or constructs subsequently can be arduous if the sample of primary studies is large. However, the decision to include respective moderator or control variables, as in any empirical analysis, should always be based on strong (theoretical) rationales about how these variables can impact the investigated effect (Bernerth and Aguinis 2016 ; Bernerth et al. 2018 ; Thompson and Higgins 2002 ). While substantive moderators refer to theoretical constructs that act as buffers or enhancers of a supposed causal process, methodological moderators are features of the respective research designs that denote the methodological context of the observations and are important to control for systematic statistical particularities (Rudolph et al. 2020 ). Havranek et al. ( 2020 ) provide a list of recommended variables to code as potential moderators. While researchers may have clear expectations about the effects for some of these moderators, the concerns for other moderators may be tentative, and moderator analysis may be approached in a rather exploratory fashion. Thus, we argue that researchers should make full use of the meta-analytical design to obtain insights about potential context dependence that a primary study cannot achieve.

2.6.3 Treatment of multiple effect sizes in a study

A long-debated issue in conducting meta-analyses is whether to use only one or all available effect sizes for the same construct within a single primary study. For meta-analyses in management research, this question is fundamental because many empirical studies, particularly those relying on company databases, use multiple variables for the same construct to perform sensitivity analyses, resulting in multiple relevant effect sizes. In this case, researchers can either (randomly) select a single value, calculate a study average, or use the complete set of effect sizes (Bijmolt and Pieters 2001 ; López-López et al. 2018 ). Multiple effect sizes from the same study enrich the meta-analytic dataset and allow us to investigate the heterogeneity of the relationship of interest, such as different variable operationalizations (López-López et al. 2018 ; Moeyaert et al. 2017 ). However, including more than one effect size from the same study violates the independency assumption of observations (Cheung 2019 ; López-López et al. 2018 ), which can lead to biased results and erroneous conclusions (Gooty et al. 2021 ). We follow the recommendation of current best practice guides to take advantage of using all available effect size observations but to carefully consider interdependencies using appropriate methods such as multilevel models, panel regression models, or robust variance estimation (Cheung 2019 ; Geyer-Klingeberg et al. 2020 ; Gooty et al. 2021 ; López-López et al. 2018 ; Moeyaert et al. 2017 ).

2.7 Step 7: analysis

2.7.1 outlier analysis and tests for publication bias.

Before conducting the primary analysis, some preliminary sensitivity analyses might be necessary, which should ensure the robustness of the meta-analytical findings (Rudolph et al. 2020 ). First, influential outlier observations could potentially bias the observed results, particularly if the number of total effect sizes is small. Several statistical methods can be used to identify outliers in meta-analytical datasets (Aguinis et al. 2013 ; Viechtbauer and Cheung 2010 ). However, there is a debate about whether to keep or omit these observations. Anyhow, relevant studies should be closely inspected to infer an explanation about their deviating results. As in any other primary study, outliers can be a valid representation, albeit representing a different population, measure, construct, design or procedure. Thus, inferences about outliers can provide the basis to infer potential moderators (Aguinis et al. 2013 ; Steel et al. 2021 ). On the other hand, outliers can indicate invalid research, for instance, when unrealistically strong correlations are due to construct overlap (i.e., lack of a clear demarcation between independent and dependent variables), invalid measures, or simply typing errors when coding effect sizes. An advisable step is therefore to compare the results both with and without outliers and base the decision on whether to exclude outlier observations with careful consideration (Geyskens et al. 2009 ; Grewal et al. 2018 ; Kepes et al. 2013 ). However, instead of simply focusing on the size of the outlier, its leverage should be considered. Thus, Viechtbauer and Cheung ( 2010 ) propose considering a combination of standardized deviation and a study’s leverage.

Second, as mentioned in the context of a literature search, potential publication bias may be an issue. Publication bias can be examined in multiple ways (Rothstein et al. 2005 ). First, the funnel plot is a simple graphical tool that can provide an overview of the effect size distribution and help to detect publication bias (Stanley and Doucouliagos 2010 ). A funnel plot can also support in identifying potential outliers. As mentioned above, a graphical display of deviation (e.g., studentized residuals) and leverage (Cook’s distance) can help detect the presence of outliers and evaluate their influence (Viechtbauer and Cheung 2010 ). Moreover, several statistical procedures can be used to test for publication bias (Harrison et al. 2017 ; Kepes et al. 2012 ), including subgroup comparisons between published and unpublished studies, Begg and Mazumdar’s ( 1994 ) rank correlation test, cumulative meta-analysis (Borenstein et al. 2009 ), the trim and fill method (Duval and Tweedie 2000a , b ), Egger et al.’s ( 1997 ) regression test, failsafe N (Rosenthal 1979 ), or selection models (Hedges and Vevea 2005 ; Vevea and Woods 2005 ). In examining potential publication bias, Kepes et al. ( 2012 ) and Harrison et al. ( 2017 ) both recommend not relying only on a single test but rather using multiple conceptionally different test procedures (i.e., the so-called “triangulation approach”).

2.7.2 Model choice

After controlling and correcting for the potential presence of impactful outliers or publication bias, the next step in meta-analysis is the primary analysis, where meta-analysts must decide between two different types of models that are based on different assumptions: fixed-effects and random-effects (Borenstein et al. 2010 ). Fixed-effects models assume that all observations share a common mean effect size, which means that differences are only due to sampling error, while random-effects models assume heterogeneity and allow for a variation of the true effect sizes across studies (Borenstein et al. 2010 ; Cheung and Vijayakumar 2016 ; Hunter and Schmidt 2004 ). Both models are explained in detail in standard textbooks (e.g., Borenstein et al. 2009 ; Hunter and Schmidt 2004 ; Lipsey and Wilson 2001 ).

In general, the presence of heterogeneity is likely in management meta-analyses because most studies do not have identical empirical settings, which can yield different effect size strengths or directions for the same investigated phenomenon. For example, the identified studies have been conducted in different countries with different institutional settings, or the type of study participants varies (e.g., students vs. employees, blue-collar vs. white-collar workers, or manufacturing vs. service firms). Thus, the vast majority of meta-analyses in management research and related fields use random-effects models (Aguinis et al. 2011a ). In a meta-regression, the random-effects model turns into a so-called mixed-effects model because moderator variables are added as fixed effects to explain the impact of observed study characteristics on effect size variations (Raudenbush 2009 ).

2.8 Step 8: reporting results

2.8.1 reporting in the article.

The final step in performing a meta-analysis is reporting its results. Most importantly, all steps and methodological decisions should be comprehensible to the reader. DeSimone et al. ( 2020 ) provide an extensive checklist for journal reviewers of meta-analytical studies. This checklist can also be used by authors when performing their analyses and reporting their results to ensure that all important aspects have been addressed. Alternative checklists are provided, for example, by Appelbaum et al. ( 2018 ) or Page et al. ( 2021 ). Similarly, Levitt et al. ( 2018 ) provide a detailed guide for qualitative meta-analysis reporting standards.

For quantitative meta-analyses, tables reporting results should include all important information and test statistics, including mean effect sizes; standard errors and confidence intervals; the number of observations and study samples included; and heterogeneity measures. If the meta-analytic sample is rather small, a forest plot provides a good overview of the different findings and their accuracy. However, this figure will be less feasible for meta-analyses with several hundred effect sizes included. Also, results displayed in the tables and figures must be explained verbally in the results and discussion sections. Most importantly, authors must answer the primary research question, i.e., whether there is a positive, negative, or no relationship between the variables of interest, or whether the examined intervention has a certain effect. These results should be interpreted with regard to their magnitude (or significance), both economically and statistically. However, when discussing meta-analytical results, authors must describe the complexity of the results, including the identified heterogeneity and important moderators, future research directions, and theoretical relevance (DeSimone et al. 2019 ). In particular, the discussion of identified heterogeneity and underlying moderator effects is critical; not including this information can lead to false conclusions among readers, who interpret the reported mean effect size as universal for all included primary studies and ignore the variability of findings when citing the meta-analytic results in their research (Aytug et al. 2012 ; DeSimone et al. 2019 ).

2.8.2 Open-science practices

Another increasingly important topic is the public provision of meta-analytical datasets and statistical codes via open-source repositories. Open-science practices allow for results validation and for the use of coded data in subsequent meta-analyses ( Polanin et al. 2020 ), contributing to the development of cumulative science. Steel et al. ( 2021 ) refer to open science meta-analyses as a step towards “living systematic reviews” (Elliott et al. 2017 ) with continuous updates in real time. MRQ supports this development and encourages authors to make their datasets publicly available. Moreau and Gamble ( 2020 ), for example, provide various templates and video tutorials to conduct open science meta-analyses. There exist several open science repositories, such as the Open Science Foundation (OSF; for a tutorial, see Soderberg 2018 ), to preregister and make documents publicly available. Furthermore, several initiatives in the social sciences have been established to develop dynamic meta-analyses, such as metaBUS (Bosco et al. 2015 , 2017 ), MetaLab (Bergmann et al. 2018 ), or PsychOpen CAMA (Burgard et al. 2021 ).

3 Conclusion

This editorial provides a comprehensive overview of the essential steps in conducting and reporting a meta-analysis with references to more in-depth methodological articles. It also serves as a guide for meta-analyses submitted to MRQ and other management journals. MRQ welcomes all types of meta-analyses from all subfields and disciplines of management research.

Gusenbauer and Haddaway ( 2020 ), however, point out that Google Scholar is not appropriate as a primary search engine due to a lack of reproducibility of search results.

One effect size calculator by David B. Wilson is accessible via: https://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-Home.php .

The macros of David B. Wilson can be downloaded from: http://mason.gmu.edu/~dwilsonb/ .

The macros of Field and Gillet ( 2010 ) can be downloaded from: https://www.discoveringstatistics.com/repository/fieldgillett/how_to_do_a_meta_analysis.html .

The tutorials can be found via: https://www.metafor-project.org/doku.php .

Metafor does currently not provide functions to conduct MASEM. For MASEM, users can, for instance, use the package metaSEM (Cheung 2015b ).

The workbooks can be downloaded from: https://www.erim.eur.nl/research-support/meta-essentials/ .

Aguinis H, Dalton DR, Bosco FA, Pierce CA, Dalton CM (2011a) Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. J Manag 37(1):5–38

Google Scholar

Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16(2):270–301

Article Google Scholar

Aguinis H, Gottfredson RK, Wright TA (2011b) Best-practice recommendations for estimating interaction effects using meta-analysis. J Organ Behav 32(8):1033–1043

Aguinis H, Pierce CA, Bosco FA, Dalton DR, Dalton CM (2011c) Debunking myths and urban legends about meta-analysis. Organ Res Methods 14(2):306–331

Aloe AM (2015) Inaccuracy of regression results in replacing bivariate correlations. Res Synth Methods 6(1):21–27

Anderson RG, Kichkha A (2017) Replication, meta-analysis, and research synthesis in economics. Am Econ Rev 107(5):56–59

Appelbaum M, Cooper H, Kline RB, Mayo-Wilson E, Nezu AM, Rao SM (2018) Journal article reporting standards for quantitative research in psychology: the APA publications and communications BOARD task force report. Am Psychol 73(1):3–25

Aytug ZG, Rothstein HR, Zhou W, Kern MC (2012) Revealed or concealed? Transparency of procedures, decisions, and judgment calls in meta-analyses. Organ Res Methods 15(1):103–133

Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50(4):1088–1101. https://doi.org/10.2307/2533446

Bergh DD, Aguinis H, Heavey C, Ketchen DJ, Boyd BK, Su P, Lau CLL, Joo H (2016) Using meta-analytic structural equation modeling to advance strategic management research: Guidelines and an empirical illustration via the strategic leadership-performance relationship. Strateg Manag J 37(3):477–497

Becker BJ (1992) Using results from replicated studies to estimate linear models. J Educ Stat 17(4):341–362

Becker BJ (1995) Corrections to “Using results from replicated studies to estimate linear models.” J Edu Behav Stat 20(1):100–102

Bergmann C, Tsuji S, Piccinini PE, Lewis ML, Braginsky M, Frank MC, Cristia A (2018) Promoting replicability in developmental research through meta-analyses: Insights from language acquisition research. Child Dev 89(6):1996–2009

Bernerth JB, Aguinis H (2016) A critical review and best-practice recommendations for control variable usage. Pers Psychol 69(1):229–283

Bernerth JB, Cole MS, Taylor EC, Walker HJ (2018) Control variables in leadership research: A qualitative and quantitative review. J Manag 44(1):131–160

Bijmolt TH, Pieters RG (2001) Meta-analysis in marketing when studies contain multiple measurements. Mark Lett 12(2):157–169

Block J, Kuckertz A (2018) Seven principles of effective replication studies: Strengthening the evidence base of management research. Manag Rev Quart 68:355–359

Borenstein M (2009) Effect sizes for continuous data. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis. Russell Sage Foundation, pp 221–235

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2009) Introduction to meta-analysis. John Wiley, Chichester

Book Google Scholar

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2010) A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods 1(2):97–111

Borenstein M, Hedges L, Higgins J, Rothstein H (2013) Comprehensive meta-analysis (version 3). Biostat, Englewood, NJ

Borenstein M, Higgins JP (2013) Meta-analysis and subgroups. Prev Sci 14(2):134–143

Bosco FA, Steel P, Oswald FL, Uggerslev K, Field JG (2015) Cloud-based meta-analysis to bridge science and practice: Welcome to metaBUS. Person Assess Decis 1(1):3–17

Bosco FA, Uggerslev KL, Steel P (2017) MetaBUS as a vehicle for facilitating meta-analysis. Hum Resour Manag Rev 27(1):237–254

Burgard T, Bošnjak M, Studtrucker R (2021) Community-augmented meta-analyses (CAMAs) in psychology: potentials and current systems. Zeitschrift Für Psychologie 229(1):15–23

Cheung MWL (2015a) Meta-analysis: A structural equation modeling approach. John Wiley & Sons, Chichester

Cheung MWL (2015b) metaSEM: An R package for meta-analysis using structural equation modeling. Front Psychol 5:1521

Cheung MWL (2019) A guide to conducting a meta-analysis with non-independent effect sizes. Neuropsychol Rev 29(4):387–396

Cheung MWL, Chan W (2005) Meta-analytic structural equation modeling: a two-stage approach. Psychol Methods 10(1):40–64

Cheung MWL, Vijayakumar R (2016) A guide to conducting a meta-analysis. Neuropsychol Rev 26(2):121–128

Combs JG, Crook TR, Rauch A (2019) Meta-analytic research in management: contemporary approaches unresolved controversies and rising standards. J Manag Stud 56(1):1–18. https://doi.org/10.1111/joms.12427

DeSimone JA, Köhler T, Schoen JL (2019) If it were only that easy: the use of meta-analytic research by organizational scholars. Organ Res Methods 22(4):867–891. https://doi.org/10.1177/1094428118756743

DeSimone JA, Brannick MT, O’Boyle EH, Ryu JW (2020) Recommendations for reviewing meta-analyses in organizational research. Organ Res Methods 56:455–463

Duval S, Tweedie R (2000a) Trim and fill: a simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2):455–463

Duval S, Tweedie R (2000b) A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. J Am Stat Assoc 95(449):89–98

Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315(7109):629–634

Eisend M (2017) Meta-Analysis in advertising research. J Advert 46(1):21–35

Elliott JH, Synnot A, Turner T, Simmons M, Akl EA, McDonald S, Salanti G, Meerpohl J, MacLehose H, Hilton J, Tovey D, Shemilt I, Thomas J (2017) Living systematic review: 1. Introduction—the why, what, when, and how. J Clin Epidemiol 91:2330. https://doi.org/10.1016/j.jclinepi.2017.08.010

Field AP, Gillett R (2010) How to do a meta-analysis. Br J Math Stat Psychol 63(3):665–694

Fisch C, Block J (2018) Six tips for your (systematic) literature review in business and management research. Manag Rev Quart 68:103–106

Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A (2018) Science of science. Science 359(6379). https://doi.org/10.1126/science.aao0185

Geyer-Klingeberg J, Hang M, Rathgeber A (2020) Meta-analysis in finance research: Opportunities, challenges, and contemporary applications. Int Rev Finan Anal 71:101524

Geyskens I, Krishnan R, Steenkamp JBE, Cunha PV (2009) A review and evaluation of meta-analysis practices in management research. J Manag 35(2):393–419

Glass GV (2015) Meta-analysis at middle age: a personal history. Res Synth Methods 6(3):221–231

Gonzalez-Mulé E, Aguinis H (2018) Advancing theory by assessing boundary conditions with metaregression: a critical review and best-practice recommendations. J Manag 44(6):2246–2273

Gooty J, Banks GC, Loignon AC, Tonidandel S, Williams CE (2021) Meta-analyses as a multi-level model. Organ Res Methods 24(2):389–411. https://doi.org/10.1177/1094428119857471

Grewal D, Puccinelli N, Monroe KB (2018) Meta-analysis: integrating accumulated knowledge. J Acad Mark Sci 46(1):9–30

Gurevitch J, Koricheva J, Nakagawa S, Stewart G (2018) Meta-analysis and the science of research synthesis. Nature 555(7695):175–182

Gusenbauer M, Haddaway NR (2020) Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synth Methods 11(2):181–217

Habersang S, Küberling-Jost J, Reihlen M, Seckler C (2019) A process perspective on organizational failure: a qualitative meta-analysis. J Manage Stud 56(1):19–56

Harari MB, Parola HR, Hartwell CJ, Riegelman A (2020) Literature searches in systematic reviews and meta-analyses: A review, evaluation, and recommendations. J Vocat Behav 118:103377

Harrison JS, Banks GC, Pollack JM, O’Boyle EH, Short J (2017) Publication bias in strategic management research. J Manag 43(2):400–425

Havránek T, Stanley TD, Doucouliagos H, Bom P, Geyer-Klingeberg J, Iwasaki I, Reed WR, Rost K, Van Aert RCM (2020) Reporting guidelines for meta-analysis in economics. J Econ Surveys 34(3):469–475

Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. Academic Press, Orlando

Hedges LV, Vevea JL (2005) Selection methods approaches. In: Rothstein HR, Sutton A, Borenstein M (eds) Publication bias in meta-analysis: prevention, assessment, and adjustments. Wiley, Chichester, pp 145–174

Hoon C (2013) Meta-synthesis of qualitative case studies: an approach to theory building. Organ Res Methods 16(4):522–556

Hunter JE, Schmidt FL (1990) Methods of meta-analysis: correcting error and bias in research findings. Sage, Newbury Park

Hunter JE, Schmidt FL (2004) Methods of meta-analysis: correcting error and bias in research findings, 2nd edn. Sage, Thousand Oaks

Hunter JE, Schmidt FL, Jackson GB (1982) Meta-analysis: cumulating research findings across studies. Sage Publications, Beverly Hills

Jak S (2015) Meta-analytic structural equation modelling. Springer, New York, NY

Kepes S, Banks GC, McDaniel M, Whetzel DL (2012) Publication bias in the organizational sciences. Organ Res Methods 15(4):624–662

Kepes S, McDaniel MA, Brannick MT, Banks GC (2013) Meta-analytic reviews in the organizational sciences: Two meta-analytic schools on the way to MARS (the Meta-Analytic Reporting Standards). J Bus Psychol 28(2):123–143

Kraus S, Breier M, Dasí-Rodríguez S (2020) The art of crafting a systematic literature review in entrepreneurship research. Int Entrepreneur Manag J 16(3):1023–1042

Levitt HM (2018) How to conduct a qualitative meta-analysis: tailoring methods to enhance methodological integrity. Psychother Res 28(3):367–378

Levitt HM, Bamberg M, Creswell JW, Frost DM, Josselson R, Suárez-Orozco C (2018) Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: the APA publications and communications board task force report. Am Psychol 73(1):26

Lipsey MW, Wilson DB (2001) Practical meta-analysis. Sage Publications, Inc.

López-López JA, Page MJ, Lipsey MW, Higgins JP (2018) Dealing with effect size multiplicity in systematic reviews and meta-analyses. Res Synth Methods 9(3):336–351

Martín-Martín A, Thelwall M, Orduna-Malea E, López-Cózar ED (2021) Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics 126(1):871–906

Merton RK (1968) The Matthew effect in science: the reward and communication systems of science are considered. Science 159(3810):56–63

Moeyaert M, Ugille M, Natasha Beretvas S, Ferron J, Bunuan R, Van den Noortgate W (2017) Methods for dealing with multiple outcomes in meta-analysis: a comparison between averaging effect sizes, robust variance estimation and multilevel meta-analysis. Int J Soc Res Methodol 20(6):559–572

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS medicine. 6(7):e1000097

Mongeon P, Paul-Hus A (2016) The journal coverage of Web of Science and Scopus: a comparative analysis. Scientometrics 106(1):213–228

Moreau D, Gamble B (2020) Conducting a meta-analysis in the age of open science: Tools, tips, and practical recommendations. Psychol Methods. https://doi.org/10.1037/met0000351

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4(1):1–22

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A (2016) Rayyan—a web and mobile app for systematic reviews. Syst Rev 5(1):1–10

Owen E, Li Q (2021) The conditional nature of publication bias: a meta-regression analysis. Polit Sci Res Methods 9(4):867–877

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E,McDonald S,McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372. https://doi.org/10.1136/bmj.n71

Palmer TM, Sterne JAC (eds) (2016) Meta-analysis in stata: an updated collection from the stata journal, 2nd edn. Stata Press, College Station, TX

Pigott TD, Polanin JR (2020) Methodological guidance paper: High-quality meta-analysis in a systematic review. Rev Educ Res 90(1):24–46

Polanin JR, Tanner-Smith EE, Hennessy EA (2016) Estimating the difference between published and unpublished effect sizes: a meta-review. Rev Educ Res 86(1):207–236

Polanin JR, Hennessy EA, Tanner-Smith EE (2017) A review of meta-analysis packages in R. J Edu Behav Stat 42(2):206–242

Polanin JR, Hennessy EA, Tsuji S (2020) Transparency and reproducibility of meta-analyses in psychology: a meta-review. Perspect Psychol Sci 15(4):1026–1041. https://doi.org/10.1177/17456916209064

R Core Team (2021). R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ .

Rauch A (2020) Opportunities and threats in reviewing entrepreneurship theory and practice. Entrep Theory Pract 44(5):847–860

Rauch A, van Doorn R, Hulsink W (2014) A qualitative approach to evidence–based entrepreneurship: theoretical considerations and an example involving business clusters. Entrep Theory Pract 38(2):333–368

Raudenbush SW (2009) Analyzing effect sizes: Random-effects models. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis, 2nd edn. Russell Sage Foundation, New York, NY, pp 295–315

Rosenthal R (1979) The file drawer problem and tolerance for null results. Psychol Bull 86(3):638

Rothstein HR, Sutton AJ, Borenstein M (2005) Publication bias in meta-analysis: prevention, assessment and adjustments. Wiley, Chichester

Roth PL, Le H, Oh I-S, Van Iddekinge CH, Bobko P (2018) Using beta coefficients to impute missing correlations in meta-analysis research: Reasons for caution. J Appl Psychol 103(6):644–658. https://doi.org/10.1037/apl0000293

Rudolph CW, Chang CK, Rauvola RS, Zacher H (2020) Meta-analysis in vocational behavior: a systematic review and recommendations for best practices. J Vocat Behav 118:103397

Schmidt FL (2017) Statistical and measurement pitfalls in the use of meta-regression in meta-analysis. Career Dev Int 22(5):469–476

Schmidt FL, Hunter JE (2015) Methods of meta-analysis: correcting error and bias in research findings. Sage, Thousand Oaks

Schwab A (2015) Why all researchers should report effect sizes and their confidence intervals: Paving the way for meta–analysis and evidence–based management practices. Entrepreneurship Theory Pract 39(4):719–725. https://doi.org/10.1111/etap.12158

Shaw JD, Ertug G (2017) The suitability of simulations and meta-analyses for submissions to Academy of Management Journal. Acad Manag J 60(6):2045–2049

Soderberg CK (2018) Using OSF to share data: A step-by-step guide. Adv Methods Pract Psychol Sci 1(1):115–120

Stanley TD, Doucouliagos H (2010) Picture this: a simple graph that reveals much ado about research. J Econ Surveys 24(1):170–191

Stanley TD, Doucouliagos H (2012) Meta-regression analysis in economics and business. Routledge, London

Stanley TD, Jarrell SB (1989) Meta-regression analysis: a quantitative method of literature surveys. J Econ Surveys 3:54–67

Steel P, Beugelsdijk S, Aguinis H (2021) The anatomy of an award-winning meta-analysis: Recommendations for authors, reviewers, and readers of meta-analytic reviews. J Int Bus Stud 52(1):23–44

Suurmond R, van Rhee H, Hak T (2017) Introduction, comparison, and validation of Meta-Essentials: a free and simple tool for meta-analysis. Res Synth Methods 8(4):537–553

The Cochrane Collaboration (2020). Review Manager (RevMan) [Computer program] (Version 5.4).

Thomas J, Noel-Storr A, Marshall I, Wallace B, McDonald S, Mavergames C, Glasziou P, Shemilt I, Synnot A, Turner T, Elliot J (2017) Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol 91:31–37

Thompson SG, Higgins JP (2002) How should meta-regression analyses be undertaken and interpreted? Stat Med 21(11):1559–1573

Tipton E, Pustejovsky JE, Ahmadi H (2019) A history of meta-regression: technical, conceptual, and practical developments between 1974 and 2018. Res Synth Methods 10(2):161–179

Vevea JL, Woods CM (2005) Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychol Methods 10(4):428–443

Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. J Stat Softw 36(3):1–48

Viechtbauer W, Cheung MWL (2010) Outlier and influence diagnostics for meta-analysis. Res Synth Methods 1(2):112–125

Viswesvaran C, Ones DS (1995) Theory testing: combining psychometric meta-analysis and structural equations modeling. Pers Psychol 48(4):865–885

Wilson SJ, Polanin JR, Lipsey MW (2016) Fitting meta-analytic structural equation models with complex datasets. Res Synth Methods 7(2):121–139. https://doi.org/10.1002/jrsm.1199

Wood JA (2008) Methodology for dealing with duplicate study effects in a meta-analysis. Organ Res Methods 11(1):79–95

Download references

Open Access funding enabled and organized by Projekt DEAL. No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

University of Luxembourg, Luxembourg, Luxembourg

Christopher Hansen

Leibniz Institute for Psychology (ZPID), Trier, Germany

Holger Steinmetz

Trier University, Trier, Germany

Erasmus University Rotterdam, Rotterdam, The Netherlands

Wittener Institut Für Familienunternehmen, Universität Witten/Herdecke, Witten, Germany

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jörn Block .

Ethics declarations

Conflict of interest.

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Table 1 .

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Hansen, C., Steinmetz, H. & Block, J. How to conduct a meta-analysis in eight steps: a practical guide. Manag Rev Q 72 , 1–19 (2022). https://doi.org/10.1007/s11301-021-00247-4

Download citation

Published : 30 November 2021

Issue Date : February 2022

DOI : https://doi.org/10.1007/s11301-021-00247-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Find a journal
Publish with us
Track your research

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

Knowledge Base
Methodology

Content Analysis | A Step-by-Step Guide with Examples

Published on 5 May 2022 by Amy Luo . Revised on 5 December 2022.

Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual:

Books, newspapers, and magazines
Speeches and interviews
Web content and social media posts
Photographs and films

Content analysis can be both quantitative (focused on counting and measuring) and qualitative (focused on interpreting and understanding). In both types, you categorise or ‘code’ words, themes, and concepts within the texts and then analyse the results.

What is content analysis used for, advantages of content analysis, disadvantages of content analysis, how to conduct content analysis.

Researchers use content analysis to find out about the purposes, messages, and effects of communication content. They can also make inferences about the producers and audience of the texts they analyse.

Content analysis can be used to quantify the occurrence of certain words, phrases, subjects, or concepts in a set of historical or contemporary texts.

In addition, content analysis can be used to make qualitative inferences by analysing the meaning and semantic relationship of words and concepts.

Because content analysis can be applied to a broad range of texts, it is used in a variety of fields, including marketing, media studies, anthropology, cognitive science, psychology, and many social science disciplines. It has various possible goals:

Finding correlations and patterns in how concepts are communicated
Understanding the intentions of an individual, group, or institution
Identifying propaganda and bias in communication
Revealing differences in communication in different contexts
Analysing the consequences of communication content, such as the flow of information or audience responses

Prevent plagiarism, run a free check.

Unobtrusive data collection

You can analyse communication and social interaction without the direct involvement of participants, so your presence as a researcher doesn’t influence the results.

Transparent and replicable

When done well, content analysis follows a systematic procedure that can easily be replicated by other researchers, yielding results with high reliability .

Highly flexible

You can conduct content analysis at any time, in any location, and at low cost. All you need is access to the appropriate sources.

Focusing on words or phrases in isolation can sometimes be overly reductive, disregarding context, nuance, and ambiguous meanings.

Content analysis almost always involves some level of subjective interpretation, which can affect the reliability and validity of the results and conclusions.

Time intensive

Manually coding large volumes of text is extremely time-consuming, and it can be difficult to automate effectively.

If you want to use content analysis in your research, you need to start with a clear, direct research question .

Next, you follow these five steps.

Step 1: Select the content you will analyse

Based on your research question, choose the texts that you will analyse. You need to decide:

The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)
The criteria for inclusion (e.g., newspaper articles that mention a particular event, speeches by a certain politician, or websites selling a specific type of product)
The parameters in terms of date range, location, etc.

If there are only a small number of texts that meet your criteria, you might analyse all of them. If there is a large volume of texts, you can select a sample .

Step 2: Define the units and categories of analysis

Next, you need to determine the level at which you will analyse your chosen texts. This means defining:

The unit(s) of meaning that will be coded. For example, are you going to record the frequency of individual words and phrases, the characteristics of people who produced or appear in the texts, the presence and positioning of images, or the treatment of themes and concepts?
The set of categories that you will use for coding. Categories can be objective characteristics (e.g., aged 30–40, lawyer, parent) or more conceptual (e.g., trustworthy, corrupt, conservative, family-oriented).

Step 3: Develop a set of rules for coding

Coding involves organising the units of meaning into the previously defined categories. Especially with more conceptual categories, it’s important to clearly define the rules for what will and won’t be included to ensure that all texts are coded consistently.

Coding rules are especially important if multiple researchers are involved, but even if you’re coding all of the text by yourself, recording the rules makes your method more transparent and reliable.

Step 4: Code the text according to the rules

You go through each text and record all relevant data in the appropriate categories. This can be done manually or aided with computer programs, such as QSR NVivo , Atlas.ti , and Diction , which can help speed up the process of counting and categorising words and phrases.

Step 5: Analyse the results and draw conclusions

Once coding is complete, the collected data is examined to find patterns and draw conclusions in response to your research question. You might use statistical analysis to find correlations or trends, discuss your interpretations of what the results mean, and make inferences about the creators, context, and audience of the texts.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Luo, A. (2022, December 05). Content Analysis | A Step-by-Step Guide with Examples. Scribbr. Retrieved 9 April 2024, from https://www.scribbr.co.uk/research-methods/content-analysis-explained/

Is this article helpful?

Other students also liked

How to do thematic analysis | guide & examples, data collection methods | step-by-step guide & examples, qualitative vs quantitative research | examples & methods.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Indian J Anaesth
v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

The assumption of normality which specifies that the means of the sample group are normally distributed
The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

StatPages.net – provides links to a number of online power calculators
G-Power – provides a downloadable power analysis program that runs under DOS
Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

45 Analysis Examples

analytical thinking examples and definition

Analysis is a higher-order thinking skills that demonstrates your ability to compare, contrast, organize, and distinguish information. This can help us to come to well-informed conclusions and evaluations.

Analytical thinking refers to a range of higher-order cognitive processes and skills. For instance, Spaska et al. (2021) identify the key components of the cognitive function of analysis as:

“…in-depth search, data analysis and evaluation, problem-solving, and decision-making.”

These comments are essential to:

“…reasoning, planning and conducting a learning inquiry process , interpreting the yielded data and findings followed by drawing conclusions” (p. 880).

Analysis Examples

1. classifying or categorizing.

Classifying or categorizing involves arranging data, information, or objects into groups based on their shared attributes or characteristics. This process aids in understanding and organizing vast amounts of information, making it easier to analyze and interpret.

Example of Classification You have a list of different animals. You classify them into categories such as mammals, birds, reptiles, and amphibians based on their specific characteristics.

2. Prioritizing

Prioritizing requires identifying the order of importance of tasks, problems, or potential solutions. It’s a valuable process in effective time management and decision making, ensuring that limited resources are used efficiently.

Example You have several tasks to do: complete a project report, answer emails, attend a meeting, and organize your workspace. You decide to prioritize by the deadline, starting with the project report because it’s due earlier than the other tasks.

3. Sequencing Events

Sequencing refers to the practice of arranging information or events in a specific order. This process is crucial in understanding timelines, processes, or standing the order of procedures in a system.

Example In a recipe, you sequence the cooking steps from the first – such as chopping the vegetables, to the last – like sprinkling on the garnish and serving the dish.

4. Identifying Patterns

Identifying patterns involves recognizing and discovering recurrent events, behaviors, or numbers. The capability to identify patterns aids in predicting and understanding future occurrences or trends.

Example While studying monthly sales data, you identify a pattern where sales increase during the holiday seasons and decrease directly afterward, helping you predict future sales trends.

5. Drawing Conclusions

Drawing conclusions entails making an inference or a final judgment based on the gathered data or facts. This process is an essential part of decision-making and problem-solving.

Example You conduct a survey on customer satisfaction and find that the majority are satisfied with the product but dislike the customer service. You conclude that to increase customer satisfaction, the quality of customer service needs to improve.

6. Making Predictions

Making predictions involves speculating about a future event or outcome based on the available information or observed trends. It is an essential aspect of strategic planning and decision-making processes.

Example Based on the rising trend in your website’s traffic over the past few months, you predict that the site will hit a specific number of visitors by the end of the year.

7. Evaluating Evidence

Evaluating evidence requires assessing the reliability, validity, and relevance of the data or evidence related to a situation or problem. This procedure is vital in critical thinking , research, and decision-making processes. We may also engage in self-evaluation , where we reflect on ourselves and rate our behavior or performance in a recent task.

Example Before writing a scientific research report, you evaluate the data collected from experiments and studies, assessing precision and validity.

8. Deductive Reasoning

Deductive reasoning is a logical process where conclusions are drawn from a set of premises or beliefs which are generally accepted as true. It is used in problem-solving and decision-making processes.

Example of Deductive Reasoning All fruits contain seeds (premise 1), apples are a type of fruit (premise 2), therefore, apples contain seeds (conclusion drawn by deductive reasoning).

9. Inductive Reasoning

Inductive reasoning is a type of reasoning where general conclusions are drawn from specific examples. The results are probable, based upon the evidence given, and provide a basis assertive conjecture.

Example of Inductive Reasoning In all your previous experiences, birds have always had feathers, so you conclude that all birds have feathers (conclusion drawn by inductive reasoning).

10. Brainstorming

Brainstorming is a creative process used to generate multiple diverse ideas as a response to a problem or question. It encourages free-thinking and uninhibited idea generation to explore all possible solutions or concepts.

Example of Brainstorming While developing a new product, your team engages in a brainstorming session, suggesting several product designs, features, and marketing strategies.

11. SWOT Analysis

A SWOT Analysis is a strategic planning tool utilized to identify and analyze the strengths, weaknesses, opportunities, and threats in a project or business venture. It provides an organized listing of a company’s characteristics, providing a framework for understanding its capabilities and potential.

Example of SWOT Analysis A tech startup conducts a SWOT Analysis – Strengths: innovative technology, expert team; Weaknesses: lack of brand awareness, limited financial resources; Opportunities: emerging markets, partnerships; Threats: competitive market, changing technology.

12. Root Cause Analysis

Root Cause Analysis (RCA) is a systematic approach to identify the root or fundamental underlying causes behind a problem or incident. The goal is to find and fix the cause rather than merely dealing with the symptoms.

Example of Root Cause Analysis After a data breach, a business conducts a Root Cause Analysis and discovers a flaw in their cybersecurity system was the primary cause.

13. PESTLE Analysis

A PESTLE Analysis is a framework that focuses on the Political, Economic, Sociological, Technological, Legal, and Environmental factors influencing an organization or project. It helps to identify external forces that could impact a business’s performance.

Example A car manufacturing company does a PESTLE Analysis and realizes the stringent laws on emission standards (Legal) and the trend toward eco-friendly solutions (Environmental) in some markets may impact their production and sales.

14. Five Whys

The Five Whys technique is a straightforward issue-solving technique that explores the cause-and-effect relationships underlying a specific problem. The strategy involves asking “Why?” five times to get to the root cause.

Example A company’s project is delayed. Through Five Whys, they discover the root cause is a miscommunication regarding the project’s starting deadline.

15. Gap Analysis

Gap Analysis refers to the method used to identify the difference between the current state and the desired future state of a business or a project. It helps to understand what steps should be taken to drive the improvement and meet the set objectives.

Example A hotel conducts a Gap Analysis between their current guest satisfaction rates and their desired rate, revealing areas like room service quality and check-in process need improving.

16. Cost-Benefit Analysis

Cost-Benefit Analysis (CBA) is a process used to weigh the potential costs of a decision or investment against its possible benefits. The goal is to determine if the proposed action is financially viable and will bring about desirable results.

Example of Cost-Benefit Analysis An organization conducts a Cost-Benefit Analysis before deciding to purchase new computers, considering factors such as the cost of equipment, installation, and training against the productivity increase.

17. Mind Mapping

Mind mapping is a visual method used to structure, classify, and represent ideas. It encourages brainstorming by illustrating connections between thoughts, furthering the understanding and generation of new concepts.

Example Planning a company event, you create a mind map. The core idea or goal is in the center, with branches illustrating different components such as venue, attendees, and catering, offering a clear visual overview of the project.

18. Surveying or Polling

Surveying or polling is a data gathering method, typically in the form of a questionnaire sent to a specified population. It’s useful for collecting statistical data, gauging public opinion, or gathering feedback.

Example Your business conducts a customer satisfaction survey, reaching out to recent consumers to gather data on their experiences, their likes, and areas where improvements are needed.

19. Questioning or Interviewing

Questioning or interviewing involves collecting information, insights, or opinions from individuals via direct questioning. It’s a useful tool in research, job recruitment, and journalistic processes.

Example As part of a market research strategy, you interview your target audience to understand their needs, preferences, and purchasing behaviors.

20. Testing Hypotheses

Testing hypotheses is a part of the scientific method involving the formulation of propositions, conducting an experiment to test them, and analyzing the results. It aids in confirming or disproving assumptions, furthering knowledge and understanding.

Example of Hypothesis Testing You hypothesize that promoting a product on social media will increase sales. After a month of running the campaign, you analyze the sales figures to test the hypothesis.

21. Simulating or Modeling

Simulating or modeling involves creating a virtual representation or model of a system or scenario to predict outcomes, study processes, or conduct experiments. Simulations offer a safe and cost-effective way to analyze complex systems or high-risk situations.

Example For city planning, a 3D model of the city is developed to simulate the effects of implementing various traffic control methods.

22. Correlating Data Points

Correlating data points is a statistical method used to determine the relationship between two or more variables. It allows for the prediction of one variable based on the value of another and aids in identifying patterns.

Example After correlating weather data and ice cream sales, you find a positive correlation: as temperature increases, so do ice cream sales.

23. Synthesizing Information

Synthesizing information is combining data from multiple sources to draw conclusions, create new ideas, or generate knowledge. It is a critical process in research, problem-solving, and decision-making.

Example of Synthesis Writing a literature review of several research papers on climate change, you synthesize the information to provide a broader understanding of the topic.

24. Interpreting Visuals (like charts or graphs)

Interpreting visuals refers to extracting and understanding information represented visually, such as in charts, graphs, or images. It aids in transforming complex data into understandable and digestible information.

Example By interpreting a line graph depicting monthly profit, you can grasp the trend and fluctuations in the company’s profit over time.

25. Finding Anomalies or Outliers

Finding anomalies or outliers involves identifying data points that deviate significantly from the norm or expected range within a data set. This process can highlight errors or unique situations, which are crucial considerations in data analysis.

Anomaly Example In a medical trial, most participants show improved health after a new treatment. However, a few show severe side effects – these are the anomalies or outliers.

26. Sampling

Sampling is a statistical method where a subset of a group is selected to represent the whole population. The results of the sample can help make inferences about the larger group.

Example A food company wants to test a new product. Instead of giving it to all their customers directly, they select a sample of customers, offering a comprehensive and manageable way to gauge reactions.

27. Reviewing Literature

Reviewing literature involves critically reading and summarizing scholarly articles , books, or other resources relevant to a particular field or topic. It highlights trends, gaps, and controversies within the field and offers a foundation for further research.

Example of Literature Review As part of a psychology research project, you review literature on cognitive behavioral therapies, understanding its efficacy, application areas, and limitations reported in previous studies.

28. Summarizing Findings

Summarizing findings is the process of condensing information, data, or results into a brief, accessible format. It aids in communicating the essence of a study, research, or procedure without delving into intricate details.

Example After running a sales campaign, you summarize the findings to report to stakeholders, detailing the increase in sales, customer behavior, most successful strategies, and areas needing improvement.

29. Comparing and Contrasting

Comparing and contrasting involves identifying similarities between two or more items (comparison) while also noting their differences (contrast). This analysis helps in making informed decisions, understanding relationships, or emphasizing unique characteristics.

Example of Compare and Contrast You compare and contrast two mobile phones before purchasing. Similarities may include both having high-resolution cameras and differences might be in battery life or screen size. This comparison helps inform your purchasing decision.

30. Identifying Patterns

Identifying patterns involves observing the repetitive occurrences or trends in a dataset or behavior. Recognizing these patterns helps in anticipating future events or making decisions based on the repetitive or predictable nature.

Example of Pattern Recognition A stock market analyst identifies a pattern in a stock’s performance over several years, noticing that it dips in January and rises in April, providing valuable insights for investment decisions.

31. Problem Decomposition

Problem decomposition, also known as problem-breaking, is the process of breaking down a complex problem into smaller, more manageable parts. This facilitates easier analysis and problem-solving.

Example A software developer breaks down the problem of a faulty software into component issues: coding errors, user interface glitches, and database connectivity issues, each then addressed individually.

32. Evaluating Solutions

Evaluating solutions involves assessing various solutions to a problem concerning their effectiveness, potential impact, and feasibility. It is crucial in decision-making processes and ensuring an optimal solution is chosen.

Example Your business has a profit reduction problem. After generating various solutions, you evaluate each one considering factors such as cost, time, and potential impact to select the best approach.

33. Bias Identification

Bias identification involves recognizing subjective or prejudiced views that may influence judgment or analysis. Identifying biases aids in ensuring objective decision-making and analysis.

Example In a psychological study, you identify a selection bias as the study’s participants are all from the same city, potentially skewing the research results.

34. Statistical Analysis

Statistical analysis encompasses the collection, analysis, interpretation, presentation, and modeling of data. By applying statistical techniques, we can extract meaningful insights and make informed decisions.

Example An ecommerce company uses statistical analysis to understand customer behavior, looking at purchase rates, return rates, and cart abandonment rates to improve their strategy.

35. Decision Trees

Decision trees are graphical representations of potential outcomes or decisions, structured in a tree-like model. They help visualize complex decision-making processes, highlighting possibilities and consequences.

Example Planning the launch of a new product, you map out a decision tree, detailing decisions like pricing strategy, different marketing approaches, and potential market reactions.

36. Cause-and-effect Analysis

Cause-and-effect analysis, also known as the Fishbone diagram or Ishikawa diagram, is a tool to identify potential factors causing an overall effect or problem. This process assists in deep diving into root causes of a problem.

Example of Cause-and-Effect Analysis Production quality has dropped in your factory. A cause-and-effect analysis identifies several causes such as outdated machinery, untrained staff, and inconsistent raw material quality.

37. Use of Scientific Methods

The use of scientific methods involves systematic observation , measurement, experimentation, and testing to create or revise knowledge. It’s a cornerstone in fields like psychology, biology, physics, and others where hypotheses need rigorous testing.

Example of the Scientific Method Researchers using the scientific method to understand a disease might start with an observation, develop a hypothesis, conduct experiments, analyze data, and then affirm or modify their hypothesis.

38. Critical Reading

Critical reading is an active, analytical way of reading that involves questioning the content, assessing the evidence provided, determining the implication, and judging the effectiveness of arguments.

Example In reviewing a scientific research paper, you critically read, looking at the methodology, evaluating evidence, identifying potential biases, and verdicting the soundness of the conclusions.

39. Logical Reasoning

Logical reasoning entails using reason, logic, and systematic steps to arrive at a conclusion from one or more premises. It is a crucial aspect of problem-solving, decision-making, and academic studies.

Example If all dogs bark (premise) and your pet is a dog, then logically, your pet barks.

40. Scenario Planning

Scenario planning is a strategic planning method used to forecast different futures and how they could affect an organization or situation. It assists in designing flexible long-term plans.

Example An insurance company practices scenario planning, examining potential situations such as natural disasters, economic downturns, and changes in regulations, planning their strategies accordingly.

41. Data Visualization

Data visualization is the representation of data or information in a graphical format. It makes complex data more understandable and accessible, revealing trends, correlations, and patterns that might go unnoticed in text-based data.

Example In a business meeting, you present a colorful, interactive dashboard that visualizes sales data, making it easier for the team to comprehend the sales performance.

42. Inferential Thinking

Inferential thinking involves making inferences or conclusions based on evidence and reasoning but without the direct confirmation of a statement. It is often used in situations where an immediate or clear-cut judgment cannot be made.

Example of Inferential Thinking A physician uses inferential thinking when diagnosing a patient. Based on the patient’s symptoms, medical history, and lab results, the doctor makes an educated inference about the likely cause of the patient’s illness.

43. Assessing

Assessing refers to the process of appraising or evaluating something, often to determine its value, importance, size, or other qualities. By careful and systematic consideration, assessing aids in decision making and problem-solving.

Example A teacher assesses a student’s project, evaluating the accuracy of the content , the clarity of presentation, creativity, and the quality of research to give the project a final grade.

44. Critiquing

Critiquing involves thoroughly examining and interpreting a situation, concept, or work, followed by giving constructive feedback or evaluation. It provides a thorough understanding of the work and valuable insights for improvement.

Example An art critic critiques a painting by analyzing its elements — composition, color usage, subject matter, and paint application — then provides an assessment of the piece’s impact and effectiveness in achieving its purpose.

45. Deconstructing

Deconstructing is a critical strategy that involves breaking down a concept, narrative, or structure to understand its underlying assumptions , ideas, or themes. By understanding these elements, it’s possible to have a more profound understanding of the whole.

Example A literature professor deconstructs a novel with her students by examining its narrative structure, character development, themes, and stylistic devices to uncover underlying messages and cultural contexts .

Analysis and Bloom’s Taxonomy

A popular method of conceptualizing the concept of analysis is Bloom’s Taxonomy , which demonstrates where analysis sits on a rank order of cognitive processes:

Here, we can see that analysis requires a degree of effortful processing that is more complex than mere remembering, understanding, or applying, but sits below evaluation and creation on the tiers of cognition.

According to Bloom, analysis verbs can include:

Differentiate
Deconstruct
Investigate

Analysis is an essential skill for developing deep understanding of subject matter, and for students, is essential for demonstrating your depth of knowledge – especially in essay writing. To achieve analysis, consider using strategies such as compare and contrast and frameworks such as SWOT analyses, which can give you a structured way to achieve an analytical degree of thinking.

Chris Drew (PhD)

Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]

Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 5 Top Tips for Succeeding at University
Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 50 Durable Goods Examples
Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 100 Consumer Goods Examples
Chris Drew (PhD) https://helpfulprofessor.com/author/chris-drew-phd/ 30 Globalization Pros and Cons

Unit of Analysis: Definition, Types & Examples

A unit of analysis is what you discuss after your research, probably what you would regard to be the primary emphasis of your research.

The unit of analysis is the people or things whose qualities will be measured. The unit of analysis is an essential part of a research project. It’s the main thing that a researcher looks at in his research.

A unit of analysis is the object about which you hope to have something to say at the end of your analysis, perhaps the major subject of your research.

In this blog, we will define:

Definition of “unit of analysis”

Types of “unit of analysis”

What is a unit of analysis.

A unit of analysis is the thing you want to discuss after your research, probably what you would regard to be the primary emphasis of your research.

The researcher plans to comment on the primary topic or object in the research as a unit of analysis. The research question plays a significant role in determining it. The “who” or “what” that the researcher is interested in investigating is, to put it simply, the unit of analysis.

In his book “Man, the State, and War” from 2001, author Waltz divides the world into three distinct spheres of study: the individual, the state, and war.

Understanding the reasoning behind the unit of analysis is vital. The likelihood of fruitful research increases if the rationale is understood. An individual, group, organization, nation, social phenomenon, etc., are a few examples.

LEARN ABOUT: Data Analytics Projects

In business research, there are almost unlimited types of possible analytical units. Data analytics and data analysis are closely related processes that involve extracting insights from data to make informed decisions. Even though the most typical unit of analysis is the individual, many research questions can be more precisely answered by looking at other types of units. Let’s find out,

Individual Level

The most prevalent unit of analysis in business research is the individual. These are the primary analytical units. The researcher may be interested in looking into:

Employee actions
Perceptions
Attitudes, or opinions.

Employees may come from wealthy or low-income families, as well as from rural or metropolitan areas.

A researcher might investigate if personnel from rural areas are more likely to arrive on time than those from urban areas. Additionally, he can check whether workers from rural areas who come from poorer families arrive on time compared to those from rural areas who come from wealthy families.

Each time, the individual (employee) serving as the analytical unit is discussed and explained. Employee analysis as a unit of analysis can shed light on issues in business, including customer and human resource behavior.

For example, employee work satisfaction and consumer purchasing patterns impact business, making research into these topics vital.

Psychologists typically concentrate on the research of individuals. The research of individuals may significantly aid the success of a firm. Their knowledge and experiences reveal vital information. Individuals are so heavily utilized in business research.

Aggregates Level

People are not usually the focus of social science research. By combining the reactions of individuals, social scientists frequently describe and explain social interactions, communities, and groupings. Additionally, they research the collective of individuals, including communities, groups, and countries.

Aggregate levels can be divided into two types: Groups (groups with an ad hoc structure) and Organizations (groups with a formal organization).

Groups of people make up the following levels of the unit of analysis. A group is defined as two or more individuals interacting, having common traits, and feeling connected to one another.

Many definitions also emphasize interdependence or objective resemblance (Turner, 1982; Platow, Grace, & Smithson, 2011) and those who identify as group members (Reicher, 1982) .

As a result, society and gangs serve as examples of groups. According to Webster’s Online Dictionary (2012), they can resemble some clubs but be far less formal.

Siblings, identical twins, family, and small group functioning are examples of studies with many units of analysis.

In such circumstances, a whole group might be compared to another. Families, gender-specific groups, pals, Facebook groups, and work departments can all be groups.

By analyzing groups, researchers can learn how they form and how age, experience, class, and gender affect them. When aggregated, an individual’s data describes the group to which they belong.

LEARN ABOUT: Data Management Framework

Sociologists study groups like economists. Businesspeople form teams to complete projects. They’re continually researching groups and group behavior.

Organizations

The next level of the unit of analysis is organizations, which are groups of people. Organizations are groups set up formally. It could include businesses, religious groups, parts of the military, colleges, academic departments, supermarkets, business groups, and so on.

The social organization includes things like sexual composition, styles of leadership, organizational structure, systems of communication, and so on. (Susan & Wheelan, 2005; Chapais & Berman, 2004) . (Lim, Putnam, and Robert, 2010) say that well-known social organizations and religious institutions are among them.

Moody, White, and Douglas (2003) say that social organizations are hierarchical. Hasmath, Hildebrandt, and Hsu (2016) say that social organizations can take different forms. For example, they can be made by institutions like schools or governments.

Sociology, economics, political science, psychology, management, and organizational communication (Douma & Schreuder, 2013) are some social science fields that study organizations.

Organizations are different from groups in that they are more formal and have better organization. A researcher might want to study a company to generalize its results to the whole population of companies.

One way to look at an organization is by the number of employees, the net annual revenue, the net assets, the number of projects, and so on. He might want to know if big companies hire more or fewer women than small companies.

Organization researchers might be interested in how companies like Reliance, Amazon, and HCL affect our social and economic lives. People who work in business often study business organizations.

Social Level

The social level has 2 types,

Social Artifacts Level

Things are studied alongside humans. Social artifacts are human-made objects from diverse communities. Social artifacts are items, representations, assemblages, institutions, knowledge, and conceptual frameworks used to convey, interpret, or achieve a goal (IGI Global, 2017).

Cultural artifacts are anything humans generate that reveals their culture (Watts, 1981).

Social artifacts include books, newspapers, advertising, websites, technical devices, films, photographs, paintings, clothes, poems, jokes, students’ late excuses, scientific breakthroughs, furniture, machines, structures, etc. Infinite.

Humans build social objects for social behavior. As people or groups suggest a population in business research, each social object implies a class of items.

Same-class goods include business books, magazines, articles, and case studies. A business magazine’s quantity of articles, frequency, price, content, and editor in a research study may be characterized.

Then, a linked magazine’s population might be evaluated for description and explanation. Marx W. Wartofsky (1979) defined artifacts as primary artifacts utilized in production (like a camera), secondary artifacts connected to primary artifacts (like a camera user manual), and tertiary objects related to representations of secondary artifacts (like a camera user-manual sculpture).

An artifact’s scientific study reveals its creators and users. The artifacts researcher may be interested in advertising, marketing, distribution, buying, etc.

Social Interaction Level

Social artifacts include social interaction. Such as:

Eye contact with a coworker
Buying something in a store
Friendship decisions
Road accidents
Airline hijackings
Professional counseling
Whatsapp messaging

A researcher might study youthful employees’ smartphone addictions . Some addictions may involve social media, while others involve online games and movies that inhibit connection.

Smartphone addictions are examined as a societal phenomenon. Observation units are probably individuals (employees).

Anthropologists typically study social artifacts. They may be interested in the social order. A researcher who examines social interactions may be interested in how broader societal structures and factors impact daily behavior, festivals, and weddings.

LEARN ABOUT: Level of Analysis

Even though there is no perfect way to do research, it is generally agreed that researchers should try to find a unit of analysis that keeps the context needed to make sense of the data.

Researchers should consider the details of their research when deciding on the unit of analysis.

They should keep in mind that consistent use of these units throughout the analysis process (from coding to developing categories and themes to interpreting the data) is essential to gaining insight from qualitative data and protecting the reliability of the results.

QuestionPro does much more than merely serve as survey software. For every sector of the economy and every kind of issue, we have a solution. We also have systems for managing data, such as our research repository Insights Hub.

LEARN MORE FREE TRIAL

MORE LIKE THIS

Top 13 A/B Testing Software for Optimizing Your Website

Apr 12, 2024

21 Best Contact Center Experience Software in 2024

Government Customer Experience: Impact on Government Service

Apr 11, 2024

Employee Engagement App: Top 11 For Workforce Improvement

Apr 10, 2024

How it works

Meta-Analysis – Guide with Definition, Steps & Examples

Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023

“A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “

Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of meta-analysis or systematic reviews addressing important questions is significant in academics because it informs decision-making.

What is Meta-Analysis

Meta-analysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Meta-analysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.

When Should you Conduct a Meta-Analysis?

Meta-analysis has become a widely-used research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies.

The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a meta-analysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).

A researcher or a practitioner should choose meta-analysis when the following outcomes are desirable.

For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through meta-analysis is possible.

To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of meta-analysis, the reasons for differences in the effect can also be explored.

To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.

Elements of a Meta-Analysis

Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the meta-analysis. They are briefly explained below.

Characteristics:

A systematic review must be completed before conducting the meta-analysis because it provides a summary of the findings of the individual studies synthesised.
You can only conduct a meta-analysis by synthesising studies in a systematic review.
The studies selected for statistical analysis for the purpose of meta-analysis should be similar in terms of comparison, intervention, and population.

Strengths:

A meta-analysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable.
It gives more value and weightage to existing studies that do not hold practical value on their own.
Policy-makers and academicians cannot base their decisions on individual research studies. Meta-analysis provides them with a complex and solid analysis of evidence to make informed decisions.

Criticisms:

The meta-analysis uses studies exploring similar topics. Finding similar studies for the meta-analysis can be challenging.
When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the meta-analysis results could be misleading.

Steps of Conducting the Meta-Analysis

The process of conducting the meta-analysis has remained a topic of debate among researchers and scientists. However, the following 5-step process is widely accepted.

Step 1: Research Question

The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.

Step 2: Systematic Review

The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.

While some meta-analyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasi-experimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.

Step 3: Data Extraction

After selecting studies for the meta-analysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.

For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.

Step 4: Standardisation and Weighting Studies

After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.

Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.

In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.

Step 5: Absolute Effect Estimation

The ultimate step in conducting a meta-analysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.

Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the meta-analysis through a test for ‘heterogeneity.’

Forest Plot

The results of a meta-analysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.

However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the meta-analysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.

Figure-A: Hypothetical Forest Plot

Relevance to Practice and Research

Evidence Based Nursing commentaries often include recently published systematic reviews and meta-analyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.

The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from meta-analyses, and this is influenced by the quality of individual studies and the systematic review itself. However, meta-analysis cannot overcome issues related to underpowered or poorly designed studies.

Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higher-quality research is required to improve clinical decision-making. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.

Methods and Assumptions in Meta-Analysis

Ensuring the credibility of findings is imperative in all types of research, including meta-analyses. To validate the outcomes of a meta-analysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a meta-analysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.

Homogeneity is preferred in meta-analyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’s-Q and I-Square, also known as I-2 Index.

Difference Between Meta-Analysis and Systematic Reviews

Meta-analysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.

Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.

Meta-analysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Meta-analysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.

Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while meta-analyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a meta-analysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.

Software Packages For Meta-Analysis

Meta-analysis can be done through software packages, including free and paid options. One of the most commonly used software packages for meta-analysis is RevMan by the Cochrane Collaboration.

Assessing the Quality of Meta-Analysis

Assessing the quality of a meta-analysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:

Study selection: The studies included in the meta-analysis should be relevant to the research question and meet predetermined criteria for quality.
Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the meta-analysis.
Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
Analysis methods: The meta-analysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
Publication bias: The potential for publication bias should be assessed and reported in the meta-analysis, including any efforts to identify and include unpublished studies.
Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.

Overall, a high-quality meta-analysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.

Hire an Expert Writer

Orders completed by our expert writers are

Formally drafted in an academic style
Free Amendments and 100% Plagiarism Free – or your money back!
100% Confidential and Timely Delivery!
Free anti-plagiarism report
Appreciated by thousands of clients. Check client reviews

Examples of Meta-Analysis

STANLEY T.D. et JARRELL S.B. (1989), « Meta-regression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161-170.
DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a meta-analysis », Strategic Management Journal, Vol. 13, pp. 67-84.
GLASS G. (1983), « Synthesising empirical research : Meta-analysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
WOLF F.M. (1986), Meta-analysis : Quantitative methods for research synthesis, Sage University Paper n°59.
HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Meta-analysis : cumulating research findings across studies », Beverly Hills, CA : Sage.

Frequently Asked Questions

What is a meta-analysis in research.

Meta-analysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, meta-analysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.

Why is meta-analysis important?

Meta-analysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidence-based decisions and identify areas for further research.

What is an example of a meta-analysis?

A meta-analysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.

Through meta-analysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderate-intensity aerobic exercise, performed three to five times per week, was the most effective. The meta-analysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.

What is the definition of meta-analysis in clinical research?

Meta-analysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.

This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Meta-analyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.

Is meta-analysis qualitative or quantitative?

Meta-analysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, meta-analysis is considered a quantitative approach to research synthesis.

Research Paper Analysis: How to Analyze a Research Article + Example

Why might you need to analyze research? First of all, when you analyze a research article, you begin to understand your assigned reading better. It is also the first step toward learning how to write your own research articles and literature reviews. However, if you have never written a research paper before, it may be difficult for you to analyze one. After all, you may not know what criteria to use to evaluate it. But don’t panic! We will help you figure it out!

In this article, our team has explained how to analyze research papers quickly and effectively. At the end, you will also find a research analysis paper example to see how everything works in practice.

🔤 Research Analysis Definition

📊 How to Analyze a Research Article

✍️ how to write a research analysis.

📝 Analysis Example
🔎 More Examples

🔗 References

🔤 research paper analysis: what is it.

A research paper analysis is an academic writing assignment in which you analyze a scholarly article’s methodology, data, and findings. In essence, “to analyze” means to break something down into components and assess each of them individually and in relation to each other. The goal of an analysis is to gain a deeper understanding of a subject. So, when you analyze a research article, you dissect it into elements like data sources , research methods, and results and evaluate how they contribute to the study’s strengths and weaknesses.

📋 Research Analysis Format

A research analysis paper has a pretty straightforward structure. Check it out below!

Research articles usually include the following sections: introduction, methods, results, and discussion. In the following paragraphs, we will discuss how to analyze a scientific article with a focus on each of its parts.

This image shows the main sections of a research article.

How to Analyze a Research Paper: Purpose

The purpose of the study is usually outlined in the introductory section of the article. Analyzing the research paper’s objectives is critical to establish the context for the rest of your analysis.

When analyzing the research aim, you should evaluate whether it was justified for the researchers to conduct the study. In other words, you should assess whether their research question was significant and whether it arose from existing literature on the topic.

Here are some questions that may help you analyze a research paper’s purpose:

Why was the research carried out?
What gaps does it try to fill, or what controversies to settle?
How does the study contribute to its field?
Do you agree with the author’s justification for approaching this particular question in this way?

How to Analyze a Paper: Methods

When analyzing the methodology section , you should indicate the study’s research design (qualitative, quantitative, or mixed) and methods used (for example, experiment, case study, correlational research, survey, etc.). After that, you should assess whether these methods suit the research purpose. In other words, do the chosen methods allow scholars to answer their research questions within the scope of their study?

For example, if scholars wanted to study US students’ average satisfaction with their higher education experience, they could conduct a quantitative survey . However, if they wanted to gain an in-depth understanding of the factors influencing US students’ satisfaction with higher education, qualitative interviews would be more appropriate.

When analyzing methods, you should also look at the research sample . Did the scholars use randomization to select study participants? Was the sample big enough for the results to be generalizable to a larger population?

You can also answer the following questions in your methodology analysis:

Is the methodology valid? In other words, did the researchers use methods that accurately measure the variables of interest?
Is the research methodology reliable? A research method is reliable if it can produce stable and consistent results under the same circumstances.
Is the study biased in any way?
What are the limitations of the chosen methodology?

How to Analyze Research Articles’ Results

You should start the analysis of the article results by carefully reading the tables, figures, and text. Check whether the findings correspond to the initial research purpose. See whether the results answered the author’s research questions or supported the hypotheses stated in the introduction.

To analyze the results section effectively, answer the following questions:

What are the major findings of the study?
Did the author present the results clearly and unambiguously?
Are the findings statistically significant ?
Does the author provide sufficient information on the validity and reliability of the results?
Have you noticed any trends or patterns in the data that the author did not mention?

How to Analyze Research: Discussion

Finally, you should analyze the authors’ interpretation of results and its connection with research objectives. Examine what conclusions the authors drew from their study and whether these conclusions answer the original question.

You should also pay attention to how the authors used findings to support their conclusions. For example, you can reflect on why their findings support that particular inference and not another one. Moreover, more than one conclusion can sometimes be made based on the same set of results. If that’s the case with your article, you should analyze whether the authors addressed other interpretations of their findings .

Here are some useful questions you can use to analyze the discussion section:

What findings did the authors use to support their conclusions?
How do the researchers’ conclusions compare to other studies’ findings?
How does this study contribute to its field?
What future research directions do the authors suggest?
What additional insights can you share regarding this article? For example, do you agree with the results? What other questions could the researchers have answered?

This image shows how to analyze a research article.

Now, you know how to analyze an article that presents research findings. However, it’s just a part of the work you have to do to complete your paper. So, it’s time to learn how to write research analysis! Check out the steps below!

1. Introduce the Article

As with most academic assignments, you should start your research article analysis with an introduction. Here’s what it should include:

The article’s publication details . Specify the title of the scholarly work you are analyzing, its authors, and publication date. Remember to enclose the article’s title in quotation marks and write it in title case .
The article’s main point . State what the paper is about. What did the authors study, and what was their major finding?
Your thesis statement . End your introduction with a strong claim summarizing your evaluation of the article. Consider briefly outlining the research paper’s strengths, weaknesses, and significance in your thesis.

Keep your introduction brief. Save the word count for the “meat” of your paper — that is, for the analysis.

2. Summarize the Article

Now, you should write a brief and focused summary of the scientific article. It should be shorter than your analysis section and contain all the relevant details about the research paper.

Here’s what you should include in your summary:

The research purpose . Briefly explain why the research was done. Identify the authors’ purpose and research questions or hypotheses .
Methods and results . Summarize what happened in the study. State only facts, without the authors’ interpretations of them. Avoid using too many numbers and details; instead, include only the information that will help readers understand what happened.
The authors’ conclusions . Outline what conclusions the researchers made from their study. In other words, describe how the authors explained the meaning of their findings.

If you need help summarizing an article, you can use our free summary generator .

3. Write Your Research Analysis

The analysis of the study is the most crucial part of this assignment type. Its key goal is to evaluate the article critically and demonstrate your understanding of it.

We’ve already covered how to analyze a research article in the section above. Here’s a quick recap:

Analyze whether the study’s purpose is significant and relevant.
Examine whether the chosen methodology allows for answering the research questions.
Evaluate how the authors presented the results.
Assess whether the authors’ conclusions are grounded in findings and answer the original research questions.

Although you should analyze the article critically, it doesn’t mean you only should criticize it. If the authors did a good job designing and conducting their study, be sure to explain why you think their work is well done. Also, it is a great idea to provide examples from the article to support your analysis.

4. Conclude Your Analysis of Research Paper

A conclusion is your chance to reflect on the study’s relevance and importance. Explain how the analyzed paper can contribute to the existing knowledge or lead to future research. Also, you need to summarize your thoughts on the article as a whole. Avoid making value judgments — saying that the paper is “good” or “bad.” Instead, use more descriptive words and phrases such as “This paper effectively showed…”

Need help writing a compelling conclusion? Try our free essay conclusion generator !

5. Revise and Proofread

Last but not least, you should carefully proofread your paper to find any punctuation, grammar, and spelling mistakes. Start by reading your work out loud to ensure that your sentences fit together and sound cohesive. Also, it can be helpful to ask your professor or peer to read your work and highlight possible weaknesses or typos.

This image shows how to write a research analysis.

📝 Research Paper Analysis Example

We have prepared an analysis of a research paper example to show how everything works in practice.

No Homework Policy: Research Article Analysis Example

This paper aims to analyze the research article entitled “No Assignment: A Boon or a Bane?” by Cordova, Pagtulon-an, and Tan (2019). This study examined the effects of having and not having assignments on weekends on high school students’ performance and transmuted mean scores. This article effectively shows the value of homework for students, but larger studies are needed to support its findings.

Cordova et al. (2019) conducted a descriptive quantitative study using a sample of 115 Grade 11 students of the Central Mindanao University Laboratory High School in the Philippines. The sample was divided into two groups: the first received homework on weekends, while the second didn’t. The researchers compared students’ performance records made by teachers and found that students who received assignments performed better than their counterparts without homework.

The purpose of this study is highly relevant and justified as this research was conducted in response to the debates about the “No Homework Policy” in the Philippines. Although the descriptive research design used by the authors allows to answer the research question, the study could benefit from an experimental design. This way, the authors would have firm control over variables. Additionally, the study’s sample size was not large enough for the findings to be generalized to a larger population.

The study results are presented clearly, logically, and comprehensively and correspond to the research objectives. The researchers found that students’ mean grades decreased in the group without homework and increased in the group with homework. Based on these findings, the authors concluded that homework positively affected students’ performance. This conclusion is logical and grounded in data.

This research effectively showed the importance of homework for students’ performance. Yet, since the sample size was relatively small, larger studies are needed to ensure the authors’ conclusions can be generalized to a larger population.

🔎 More Research Analysis Paper Examples

Do you want another research analysis example? Check out the best analysis research paper samples below:

Gracious Leadership Principles for Nurses: Article Analysis
Effective Mental Health Interventions: Analysis of an Article
Nursing Turnover: Article Analysis
Nursing Practice Issue: Qualitative Research Article Analysis
Quantitative Article Critique in Nursing
LIVE Program: Quantitative Article Critique
Evidence-Based Practice Beliefs and Implementation: Article Critique
“Differential Effectiveness of Placebo Treatments”: Research Paper Analysis
“Family-Based Childhood Obesity Prevention Interventions”: Analysis Research Paper Example
“Childhood Obesity Risk in Overweight Mothers”: Article Analysis
“Fostering Early Breast Cancer Detection” Article Analysis
Lesson Planning for Diversity: Analysis of an Article
Journal Article Review: Correlates of Physical Violence at School
Space and the Atom: Article Analysis
“Democracy and Collective Identity in the EU and the USA”: Article Analysis
China’s Hegemonic Prospects: Article Review
Article Analysis: Fear of Missing Out
Article Analysis: “Perceptions of ADHD Among Diagnosed Children and Their Parents”
Codependence, Narcissism, and Childhood Trauma: Analysis of the Article
Relationship Between Work Intensity, Workaholism, Burnout, and MSC: Article Review

We hope that our article on research paper analysis has been helpful. If you liked it, please share this article with your friends!

Analyzing Research Articles: A Guide for Readers and Writers | Sam Mathews
Summary and Analysis of Scientific Research Articles | San José State University Writing Center
Analyzing Scholarly Articles | Texas A&M University
Article Analysis Assignment | University of Wisconsin-Madison
How to Summarize a Research Article | University of Connecticut
Critique/Review of Research Articles | University of Calgary
Art of Reading a Journal Article: Methodically and Effectively | PubMed Central
Write a Critical Review of a Scientific Journal Article | McLaughlin Library
How to Read and Understand a Scientific Paper: A Guide for Non-scientists | LSE
How to Analyze Journal Articles | Classroom

How to Write an Animal Testing Essay: Tips for Argumentative & Persuasive Papers

Descriptive essay topics: examples, outline, & more.

Business growth

Business tips

What is data analysis? Examples and how to get started

A hero image with an icon of a line graph / chart

Even with years of professional experience working with data, the term "data analysis" still sets off a panic button in my soul. And yes, when it comes to serious data analysis for your business, you'll eventually want data scientists on your side. But if you're just getting started, no panic attacks are required.

Table of contents:

Quick review: What is data analysis?

Why is data analysis important, types of data analysis (with examples), data analysis process: how to get started, frequently asked questions.

Zapier is the leader in workflow automation—integrating with 6,000+ apps from partners like Google, Salesforce, and Microsoft. Use interfaces, data tables, and logic to build secure, automated systems for your business-critical workflows across your organization's technology stack. Learn more .

Data analysis is the process of examining, filtering, adapting, and modeling data to help solve problems. Data analysis helps determine what is and isn't working, so you can make the changes needed to achieve your business goals.

Keep in mind that data analysis includes analyzing both quantitative data (e.g., profits and sales) and qualitative data (e.g., surveys and case studies) to paint the whole picture. Here are two simple examples (of a nuanced topic) to show you what I mean.

An example of quantitative data analysis is an online jewelry store owner using inventory data to forecast and improve reordering accuracy. The owner looks at their sales from the past six months and sees that, on average, they sold 210 gold pieces and 105 silver pieces per month, but they only had 100 gold pieces and 100 silver pieces in stock. By collecting and analyzing inventory data on these SKUs, they're forecasting to improve reordering accuracy. The next time they order inventory, they order twice as many gold pieces as silver to meet customer demand.

An example of qualitative data analysis is a fitness studio owner collecting customer feedback to improve class offerings. The studio owner sends out an open-ended survey asking customers what types of exercises they enjoy the most. The owner then performs qualitative content analysis to identify the most frequently suggested exercises and incorporates these into future workout classes.

Here's why it's worth implementing data analysis for your business:

Understand your target audience: You might think you know how to best target your audience, but are your assumptions backed by data? Data analysis can help answer questions like, "What demographics define my target audience?" or "What is my audience motivated by?"

Inform decisions: You don't need to toss and turn over a decision when the data points clearly to the answer. For instance, a restaurant could analyze which dishes on the menu are selling the most, helping them decide which ones to keep and which ones to change.

Adjust budgets: Similarly, data analysis can highlight areas in your business that are performing well and are worth investing more in, as well as areas that aren't generating enough revenue and should be cut. For example, a B2B software company might discover their product for enterprises is thriving while their small business solution lags behind. This discovery could prompt them to allocate more budget toward the enterprise product, resulting in better resource utilization.

Identify and solve problems: Let's say a cell phone manufacturer notices data showing a lot of customers returning a certain model. When they investigate, they find that model also happens to have the highest number of crashes. Once they identify and solve the technical issue, they can reduce the number of returns.

There are five main types of data analysis—with increasingly scary-sounding names. Each one serves a different purpose, so take a look to see which makes the most sense for your situation. It's ok if you can't pronounce the one you choose.

Types of data analysis including text analysis, statistical analysis, diagnostic analysis, predictive analysis, and prescriptive analysis.

Text analysis: What is happening?

Text analysis, AKA data mining , involves pulling insights from large amounts of unstructured, text-based data sources : emails, social media, support tickets, reviews, and so on. You would use text analysis when the volume of data is too large to sift through manually.

Here are a few methods used to perform text analysis, to give you a sense of how it's different from a human reading through the text:

Word frequency identifies the most frequently used words. For example, a restaurant monitors social media mentions and measures the frequency of positive and negative keywords like "delicious" or "expensive" to determine how customers feel about their experience.

Language detection indicates the language of text. For example, a global software company may use language detection on support tickets to connect customers with the appropriate agent.

Keyword extraction automatically identifies the most used terms. For example, instead of sifting through thousands of reviews, a popular brand uses a keyword extractor to summarize the words or phrases that are most relevant.

Because text analysis is based on words, not numbers, it's a bit more subjective. Words can have multiple meanings, of course, and Gen Z makes things even tougher with constant coinage. Natural language processing (NLP) software will help you get the most accurate text analysis, but it's rarely as objective as numerical analysis.

Statistical analysis: What happened?

Statistical analysis pulls past data to identify meaningful trends. Two primary categories of statistical analysis exist: descriptive and inferential.

Descriptive analysis

Descriptive analysis looks at numerical data and calculations to determine what happened in a business. Companies use descriptive analysis to determine customer satisfaction , track campaigns, generate reports, and evaluate performance.

Here are a few methods used to perform descriptive analysis:

Measures of frequency identify how frequently an event occurs. For example, a popular coffee chain sends out a survey asking customers what their favorite holiday drink is and uses measures of frequency to determine how often a particular drink is selected.

Measures of central tendency use mean, median, and mode to identify results. For example, a dating app company might use measures of central tendency to determine the average age of its users.

Measures of dispersion measure how data is distributed across a range. For example, HR may use measures of dispersion to determine what salary to offer in a given field.

Inferential analysis

Inferential analysis uses a sample of data to draw conclusions about a much larger population. This type of analysis is used when the population you're interested in analyzing is very large.

Here are a few methods used when performing inferential analysis:

Hypothesis testing identifies which variables impact a particular topic. For example, a business uses hypothesis testing to determine if increased sales were the result of a specific marketing campaign.

Confidence intervals indicates how accurate an estimate is. For example, a company using market research to survey customers about a new product may want to determine how confident they are that the individuals surveyed make up their target market.

Regression analysis shows the effect of independent variables on a dependent variable. For example, a rental car company may use regression analysis to determine the relationship between wait times and number of bad reviews.

Diagnostic analysis: Why did it happen?

Diagnostic analysis, also referred to as root cause analysis, uncovers the causes of certain events or results.

Here are a few methods used to perform diagnostic analysis:

Time-series analysis analyzes data collected over a period of time. A retail store may use time-series analysis to determine that sales increase between October and December every year.

Data drilling uses business intelligence (BI) to show a more detailed view of data. For example, a business owner could use data drilling to see a detailed view of sales by state to determine if certain regions are driving increased sales.

Correlation analysis determines the strength of the relationship between variables. For example, a local ice cream shop may determine that as the temperature in the area rises, so do ice cream sales.

Predictive analysis: What is likely to happen?

Predictive analysis aims to anticipate future developments and events. By analyzing past data, companies can predict future scenarios and make strategic decisions.

Here are a few methods used to perform predictive analysis:

Machine learning uses AI and algorithms to predict outcomes. For example, search engines employ machine learning to recommend products to online shoppers that they are likely to buy based on their browsing history.

Decision trees map out possible courses of action and outcomes. For example, a business may use a decision tree when deciding whether to downsize or expand.

Prescriptive analysis: What action should we take?

The highest level of analysis, prescriptive analysis, aims to find the best action plan. Typically, AI tools model different outcomes to predict the best approach. While these tools serve to provide insight, they don't replace human consideration, so always use your human brain before going with the conclusion of your prescriptive analysis. Otherwise, your GPS might drive you into a lake.

Here are a few methods used to perform prescriptive analysis:

Lead scoring is used in sales departments to assign values to leads based on their perceived interest. For example, a sales team uses lead scoring to rank leads on a scale of 1-100 depending on the actions they take (e.g., opening an email or downloading an eBook). They then prioritize the leads that are most likely to convert.

Algorithms are used in technology to perform specific tasks. For example, banks use prescriptive algorithms to monitor customers' spending and recommend that they deactivate their credit card if fraud is suspected.

The actual analysis is just one step in a much bigger process of using data to move your business forward. Here's a quick look at all the steps you need to take to make sure you're making informed decisions.

Data decision

As with almost any project, the first step is to determine what problem you're trying to solve through data analysis.

Make sure you get specific here. For example, a food delivery service may want to understand why customers are canceling their subscriptions. But to enable the most effective data analysis, they should pose a more targeted question, such as "How can we reduce customer churn without raising costs?"

These questions will help you determine your KPIs and what type(s) of data analysis you'll conduct , so spend time honing the question—otherwise your analysis won't provide the actionable insights you want.

Data collection

Next, collect the required data from both internal and external sources.

Internal data comes from within your business (think CRM software, internal reports, and archives), and helps you understand your business and processes.

External data originates from outside of the company (surveys, questionnaires, public data) and helps you understand your industry and your customers.

You'll rely heavily on software for this part of the process. Your analytics or business dashboard tool, along with reports from any other internal tools like CRMs , will give you the internal data. For external data, you'll use survey apps and other data collection tools to get the information you need.

Data cleaning

Data can be seriously misleading if it's not clean. So before you analyze, make sure you review the data you collected. Depending on the type of data you have, cleanup will look different, but it might include:

Removing unnecessary information

Addressing structural errors like misspellings

Deleting duplicates

Trimming whitespace

Human checking for accuracy

You can use your spreadsheet's cleanup suggestions to quickly and effectively clean data, but a human review is always important.

Data analysis

Now that you've compiled and cleaned the data, use one or more of the above types of data analysis to find relationships, patterns, and trends.

Data analysis tools can speed up the data analysis process and remove the risk of inevitable human error. Here are some examples.

Spreadsheets sort, filter, analyze, and visualize data.

Business intelligence platforms model data and create dashboards.

Structured query language (SQL) tools manage and extract data in relational databases.

Data interpretation

After you analyze the data, you'll need to go back to the original question you posed and draw conclusions from your findings. Here are some common pitfalls to avoid:

Correlation vs. causation: Just because two variables are associated doesn't mean they're necessarily related or dependent on one another.

Confirmation bias: This occurs when you interpret data in a way that confirms your own preconceived notions. To avoid this, have multiple people interpret the data.

Small sample size: If your sample size is too small or doesn't represent the demographics of your customers, you may get misleading results. If you run into this, consider widening your sample size to give you a more accurate representation.

Data visualization

Last but not least, visualizing the data in the form of graphs, maps, reports, charts, and dashboards can help you explain your findings to decision-makers and stakeholders. While it's not absolutely necessary, it will help tell the story of your data in a way that everyone in the business can understand and make decisions based on.

Automate your data collection

Data doesn't live in one place. To make sure data is where it needs to be—and isn't duplicative or conflicting—make sure all your apps talk to each other. Zapier automates the process of moving data from one place to another, so you can focus on the work that matters to move your business forward.

Need a quick summary or still have a few nagging data analysis questions? I'm here for you.

What are the five types of data analysis?

The five types of data analysis are text analysis, statistical analysis, diagnostic analysis, predictive analysis, and prescriptive analysis. Each type offers a unique lens for understanding data: text analysis provides insights into text-based content, statistical analysis focuses on numerical trends, diagnostic analysis looks into problem causes, predictive analysis deals with what may happen in the future, and prescriptive analysis gives actionable recommendations.

What is the data analysis process?

The data analysis process involves data decision, collection, cleaning, analysis, interpretation, and visualization. Every stage comes together to transform raw data into meaningful insights. Decision determines what data to collect, collection gathers the relevant information, cleaning ensures accuracy, analysis uncovers patterns, interpretation assigns meaning, and visualization presents the insights.

What is the main purpose of data analysis?

In business, the main purpose of data analysis is to uncover patterns, trends, and anomalies, and then use that information to make decisions, solve problems, and reach your business goals.

Improve your productivity automatically. Use Zapier to get your apps working together.

A Zap with the trigger 'When I get a new lead from Facebook,' and the action 'Notify my team in Slack'

Search Menu
Chemical Biology and Nucleic Acid Chemistry
Computational Biology
Critical Reviews and Perspectives
Data Resources and Analyses
Gene Regulation, Chromatin and Epigenetics
Genome Integrity, Repair and Replication
Methods Online
Molecular Biology
Nucleic Acid Enzymes
RNA and RNA-protein complexes
Structural Biology
Synthetic Biology and Bioengineering
Advance Articles
Breakthrough Articles
Special Collections
Scope and Criteria for Consideration
Author Guidelines
Data Deposition Policy
Database Issue Guidelines
Web Server Issue Guidelines
Submission Site
About Nucleic Acids Research
Editors & Editorial Board
Information of Referees
Self-Archiving Policy
Dispatch Dates
Advertising and Corporate Services
Journals Career Network
Journals on Oxford Academic
Books on Oxford Academic

Article Contents

Introduction, overall design and workflow of metaboanalyst 6.0, supporting asari and ms2 spectra in lc–ms spectra processing workflow, ms2 peak annotation, causal analysis via two-sample mendelian randomization, dose–response analysis, updated compound database and knowledge libraries, other features, comparison with other tools, data availability, acknowledgements, metaboanalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation.

Article contents
Figures & tables
Supplementary Data

Zhiqiang Pang, Yao Lu, Guangyan Zhou, Fiona Hui, Lei Xu, Charles Viau, Aliya F Spigelman, Patrick E MacDonald, David S Wishart, Shuzhao Li, Jianguo Xia, MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation, Nucleic Acids Research , 2024;, gkae253, https://doi.org/10.1093/nar/gkae253

Permissions Icon Permissions

We introduce MetaboAnalyst version 6.0 as a unified platform for processing, analyzing, and interpreting data from targeted as well as untargeted metabolomics studies using liquid chromatography - mass spectrometry (LC–MS). The two main objectives in developing version 6.0 are to support tandem MS (MS2) data processing and annotation, as well as to support the analysis of data from exposomics studies and related experiments. Key features of MetaboAnalyst 6.0 include: (i) a significantly enhanced Spectra Processing module with support for MS2 data and the asari algorithm; (ii) a MS2 Peak Annotation module based on comprehensive MS2 reference databases with fragment-level annotation; (iii) a new Statistical Analysis module dedicated for handling complex study design with multiple factors or phenotypic descriptors; (iv) a Causal Analysis module for estimating metabolite - phenotype causal relations based on two-sample Mendelian randomization, and (v) a Dose-Response Analysis module for benchmark dose calculations. In addition, we have also improved MetaboAnalyst's visualization functions, updated its compound database and metabolite sets, and significantly expanded its pathway analysis support to around 130 species. MetaboAnalyst 6.0 is freely available at https://www.metaboanalyst.ca .

Metabolomics involves the comprehensive study of all small molecules in a biological system. It has diverse applications ranging from basic biochemical research to clinical investigation of diseases, food safety assessment, environmental monitoring, etc. ( 1–5 ). User-friendly and easily accessible bioinformatics tools are essential to deal with the complex data produced from metabolomics studies. MetaboAnalyst is a user-friendly, web-based platform developed to provide comprehensive support for metabolomics data analysis ( 6–10 ). The early versions (1.0–3.0) focused primarily on supporting statistical and functional analysis of targeted metabolomics data. Increasing support for untargeted metabolomics data from liquid chromatography–mass spectrometry (LC–MS) experiments have been gradually introduced in more recent versions of MetaboAnalyst. For instance, version 4.0 implemented a new module to support functional analysis directly from LC–MS peaks, while version 5.0 added an auto-optimized LC–MS spectral processing module that works seamlessly with the functional analysis module. A detailed protocol on how to use different modules for comprehensive analysis of untargeted metabolomics data was published in 2022 ( 11 ). According to Google Analytics, the MetaboAnalyst web server has processed over 2 million jobs, including 33 000 spectral processing jobs over the past 12 months. Many of these jobs are associated with untargeted metabolomics and exposomics studies.

Untargeted metabolomics data generated from high-resolution LC–MS instruments are typically characterized by thousands of peaks with unknown chemical identities. To assist with compound identification, tandem MS (called MS/MS or MS2) spectra are often collected from pooled QC samples during the experiments ( 12 ). The two commonly used MS2 methods are data-dependent acquisition (DDA) and data-independent acquisition (DIA), with sequential window acquisition of all theoretical mass spectra (SWATH) being a promising special case of the latter. DDA data usually have clear associations between the precursor ions and the corresponding MS2 spectra, while DIA data generally require deconvolution of the MS2 data to reconstruct associations with their precursor ions ( 13 ). Incorporating MS2 processing and annotation into untargeted metabolomics workflows can greatly improve compound annotations and functional interpretation.

Exposomics is an emerging field centered on profiling the complete set of exposures individuals encounter across their lifespan, which often involves MS analysis of chemical mixtures traditionally rooted in toxicology and public health ( 4 ). Untargeted LC–MS based metabolomics is increasingly applied to exposomics and toxicology studies. Exposomics data from human cohorts is often associated with complex phenotypic data due to their observational nature. This requires more sophisticated data analysis and visualization methods that can take into consideration of multiple factors or covariates. Exposomics studies typically produce long lists of potential biomarkers that are significantly associated with phenotypes of interest. Identification of causal links from this large number of metabolite-phenotype relations is a natural next step. It has become possible recently with the availability of many metabolomic genome-wide association studies (mGWAS) that link metabolites and genotypes ( 14–16 ). By integrating mGWAS data with comparable GWAS data that associate genotypes with various phenotypes ( 17 ), we can now estimate causal relationships between a metabolite and a phenotype of interest through Mendelian randomization (MR) ( 18 ). Dose-response experiments are often performed to further quantify cause-and-effect relationships. The experiments are often conducted at multiple dose levels using in vitro assays or animal models to calculate dose-response curves for risk assessment of chemical exposures ( 19–21 ).

To address these emerging needs from both the metabolomics and exposomics communities, we have developed MetaboAnalyst version 6.0. This version includes many key features:

A significantly enhanced spectra processing workflow with the addition of asari algorithm for LC–MS spectra processing ( 22 ), as well as support for MS2 (DDA or SWATH-DIA) data processing.

A new module for MS2 spectral database searching for compound identification and results visualization.

A new module for causal analysis between metabolites and phenotypes of interest based on two-sample MR (2SMR).

A new module for dose-response analysis including dose-response curve fitting and benchmark dose (BMD) calculation.

A new module for statistical analysis with complex metadata;

A number of other important updates including: improved functional analysis of untargeted metabolomics data by integrating MS2-based compound identification; updated compound database, pathways and metabolite sets; as well as improved data visualization support across multiple modules.

MetaboAnalyst 6.0 is feely accessible at https://www.metaboanalyst.ca , with comprehensive documentations and updated tutorials. To better engage with our users, a dedicated user forum ( https://omicsforum.ca ) has been operational since May 2022. To dates, this forum contains >4000 posts on ∼700 topics related to different aspects of using MetaboAnalyst.

MetaboAnalyst 6.0 accepts a total of five different data types across various modules encompassing spectra processing, statistical analysis, functional analysis, meta-analysis, and integration with other omics data. Once the data are uploaded, all analysis steps are conducted within a consistent framework including data integrity checks, parameter customization, and results visualization (Figure 1 ). Some of the key features in MetaboAnalyst 6.0 are described below.

MetaboAnalyst 6.0 workflow for targeted and untargeted metabolomics data. Multiple data input types are accepted. Untargeted metabolomics inputs require extra steps for spectra processing and peak annotation. The result table can be used for statistical and functional analysis within a consistent workflow in the same manner as for targeted metabolomics data.

LC–MS spectra processing remains an active research topic in the field of untargeted metabolomics. Many powerful tools have been developed over time, including XCMS ( 23 ), MZmine ( 24 ), MS-DIAL ( 13 ) and asari ( 22 ). In addition to using different peak detection algorithms, most tools require manual parameter tuning to ensure good results. Such practice often leads to results that vary significantly ( 25 ). To mitigate this issue, MetaboAnalyst 5.0 introduced an auto-optimized LC–MS processing pipeline to minimize the parameter-related effects ( 10 , 26 ). The asari software has introduced a set of quality metrics, concepts of mass tracks and composite mass tracks and new algorithmic design to minimize errors in feature correspondence. It requires minimal parameter tuning while achieving much faster computational performance ( 22 ). The asari algorithm is now available in the LC–MS spectra processing options, alongside the traditional approaches.

MS2 spectra processing and metabolite identification are important components of untargeted metabolomics. It is now recognized that MS2 spectral deconvolution is necessary to achieve high-quality compound identification results for both DDA and SWATH-DIA data ( 27–29 ). MetaboAnalyst 6.0 offers an efficient, auto-optimized pipeline for MS2 spectral deconvolution. The DDA data deconvolution method is derived from the DecoID algorithm ( 28 ), which employs a database-dependent regression model to deconvolve contaminated spectra. The SWATH-DIA data deconvolution algorithm is based on the DecoMetDIA method ( 29 ), with the core algorithm re-implemented using a Rcpp/C++ framework to achieve high performance. When MS2 spectra replicates are provided, an extra step will be performed to generate consensus spectra across replicates. The consensus spectra are searched against MetaboAnalyst's curated MS2 reference databases for compound identification based on dot product ( 28 ) or spectral entropy ( 30 ) similarity scores. The complete pipelines for DDA and SWATH-DIA are available from the Spectra Processing [LC–MS w/wo MS2] module.

Raw spectra must be saved in common open formats and uploaded individually as separate zip files. LC–MS spectra data is mandatory, while MS2 is optional. Upon data uploading, MetaboAnalyst 6.0 first validates the status of the MS files. For SWATH-DIA data, the SWATH window design is automatically extracted from the spectra. If the related information is missing, users will be prompted to manually enter the window design. On the parameters setting page, users can choose the auto-optimized centWave algorithm ( 26 ) or the asari algorithm for LC–MS data processing. If MS2 data is included, spectra deconvolution, consensus, and database searching will be performed using the identified MS features as target list. Once the spectra processing is complete, users can explore both MS and MS2 data processing results (Figure 2A - B ) and download the files or directly go to the Functional Analysis module.

Example outputs from MetaboAnalyst 6.0. ( A ) Integrated 3D PCA score and loading plots summarizing the raw spectra processing results. ( B ) An interactive mirror plot showing the MS2 matching result. Matched fragments are marked with a red diamond. ( C ) Functional analysis results with the top four significant pathways labelled. ( D ) A forest plot comparing the effect sizes calculated based on individual SNPs (black) or using all SNPs by different MR methods (red). ( E ) Bar plots of the dose response curve fitting results showing how many times each model type was identified as the best fit. ( F ) A dose-response curve fitting result showing each of the concentration values (black points), the fitted curve (solid blue line), and the estimated benchmark dose (solid red line) with its lower and upper 95% confidence intervals (dashed red lines), respectively.

MS2 data could be acquired independently from MS data acquisition. To accommodate this scenario and offer compatibility with MS2 spectra results from other popular tools such as MS-DIAL, we have added a Peak Annotation [MS2-DDA/DIA] module to allow users to directly upload MS2 spectra for database searching. Users can enter a single MS2 spectrum or upload an MSP or MGF file containing multiple MS2 spectra. For single spectrum searching, users must specify the m/z value of the precursor ion. However, for batch searching based on an MSP file, users do not need to specify the precursors’ m/z values. To ensure timely completion of database searching, the public server processes only 20 spectra for each submission (the first 20 spectra by default). Users can manually specify spectra for searching. After conducting this pilot analysis with 20 spectra, users can download the R command history and use our MetaboAnalystR package to annotate all MS2 spectra ( 26 ).

Multiple databases are available for compound identification. Database searching can be performed based on regular reference MS2 spectra and/or their corresponding neutral loss spectra. The results are visually summarized as mirror plots based on the matching scores (Figure 2B). Users can interactively explore the MS2 database matching results. The molecular formulas for the MS2 peaks in the reference database spectra are predicted using the BUDDY program ( 31 ). Users can download the complete compound identification table together with the mirror plots.

Understanding the causal relationships between metabolites and phenotypes is of great interest in both metabolomics and exposomics. GWAS have established links between genetic variants (e.g. single nucleotide polymorphism, or SNPs) and various phenotypes ( 32 ), while recent mGWAS provide connections between genotypes with metabolites or metabolite concentration changes. It becomes possible to estimate causal relationships between metabolites and a phenotype of interest. If a metabolite is causal for a given disease, genetic variants which influence the levels of that metabolite, either directly through affecting related enzymes or indirectly through influencing lifestyle choices (such as dietary habits), should result in a higher risk of the disease. These causal effects can be estimated through Mendelian randomization (MR) analysis ( 18 ). MR relies on the principle that genetic variants are randomly distributed across populations, similar to how treatments are randomly assigned in clinical trials. By leveraging this random allocation, MR can evaluate whether a relationship between a metabolite and a phenotype is causal, while reducing the impact of confounding factors and reverse causality that often plague observational studies.

MR analysis in MetaboAnalyst is based on the 2SMR approach (using the TwoSampleMR and MRInstruments R packages) which enables application of MR methods using summary statistics from non-overlapping individuals ( 17 , 33 ). Users should first select an exposure (i.e. a metabolite) and an outcome (i.e. a disease) of interest. Based on the selections, the program searches for potential instrumental variables (i.e. SNPs) that are associated with both the metabolite from our large collections of the recent mGWAS studies ( 14 ) and the disease from the OpenGWAS database ( 17 ). The next step is to perform SNP filtering and harmonization to identify independent SNPs through linkage disequilibrium (LD) clumping ( 34 ). When SNPs are absent in the GWAS database, proxy SNPs are identified using LD. In addition, it is critical to harmonize SNPs to make sure effect sizes for the SNPs on both exposures and the outcomes are for the same reference alleles. The last step before conducting MR analysis is to exclude SNPs affecting multiple metabolites to reduce horizontal pleiotropy which occurs when a genetic variant influences the outcome through pathways other than the exposure of interest ( 35 ). MetaboAnalyst's MR analysis page provides diverse statistical methods (currently 12), each of which has its own strengths and limitations. For instance, the weighted median method is robust to the violation of MR assumptions by some of the genetic variants, while Egger regression method is more robust to horizontal pleiotropy. Users can point their mouse over the corresponding question marks beside each method to learn more details.

Dose–response analysis is commonly used in toxicology and pharmacology for understanding how varying concentrations of a chemical can impact a biological system. It plays a pivotal role in risk assessment of chemical exposures ( 36 ). A key output of dose-response analysis is the benchmark dose (BMD), the minimum dose of a substance that produces a clear, low level health risk relative to the control group ( 37 ). Chemicals identified from exposomics are often followed up by dose–response studies to understand their mechanism of action or adverse outcome pathways ( 21 , 38 , 39 ).

Dose–response experiment design includes a control group (dose = 0) and at least three different dose groups, typically with the same number of replicates in each group. The data should be formatted as a csv file with their dose information included as the second row or column. The analysis workflow consists of four main steps: (i) data upload, integrity checking, processing and normalization; (ii) differential analysis to select features that vary with dose levels; (iii) curve fitting on the intensity or concentration values of those selected features against a suite of linear and non-linear models, and (iv) computing BMD values for each feature. The algorithm for dose–response analysis was adapted from the algorithm we developed for transcriptomics BMD analysis ( 40 , 41 ).

Compound database

The compound database has been updated based on HMDB 5.0 ( 42 ), with particular efforts made to synchronize with the IDs of other databases such as KEGG ( 43 ) and PubChem ( 44 ) to improve cross-references during compound mapping and pathway analysis. The compound database was expanded by ∼4000 compounds (after removing ∼10 000 deprecated HMDB entries and adding ∼14 000 new entries).

MS2 reference spectra database.

A total of 12 MS2 reference databases were collected and curated from public resources, including the HMDB experimental MS2 database ( 42 ), the HMDB predicted MS2 database ( 42 ), Global Natural Product Social Molecular Networking (GNPS) database ( 45 ), MoNA ( 46 ), MassBank ( 46 ), MINEs ( 47 ), LipidBlast ( 48 ), RIKEN ( 49 ), ReSpect ( 50 ), BMDMS ( 51 ), VaniyaNP ( 46 ) and the MS-DIAL database (v4.90) ( 52 ). The complete MS2 reference database currently comprises 10 420 215 MS2 records from 1 551 012 unique compounds. We also created a neutral loss spectra database calculated based on the algorithm implemented by the METLIN neutral loss database ( 53 ). The molecular formula of all MS2 fragments were pre-calculated using BUDDY ( 31 ).

Pathway and metabolite set libraries

The KEGG pathway libraries have been updated to their recent version (12/20/2023) via KEGG API. Based on user feedback, the pathway analysis for both targeted and untargeted metabolomics data now supports ∼130 species (up from 28 species in version 5.0), including many new mammals, plants, insects, fungi, and bacteria, etc. We also updated the metabolite set libraries based on HMDB 5.0, MarkerDB ( 54 ), as well as manual curation. For instance, a total of 62 metabolite sets associated with dietary and chemical exposures were added during this process. The metabolite set library also incorporated ∼3700 pathways downloaded from the RaMP-DB ( 55 ).

Statistical analysis with complex metadata

The Statistical Analysis [metadata table] module in MetaboAnalyst 6.0 now provides a comprehensive suite of methods for analyzing and visualizing metabolomics data in relation to various metadata, be it discrete or continuous. Users can quickly assess the correlation patterns among different experimental factors using the metadata overview heatmaps or interactive PCA visualization. The interactive heatmap visualization coupled with hierarchical clustering allows users to easily explore feature abundance variations across different samples and metadata variables. The statistical methods in this module include both univariate linear models with covariate adjustment as well as multivariate methods such as ANOVA Simultaneous Component Analysis ( 56 , 57 ). Random forest is offered for classification with consideration of different metadata variables of interest. More details about this module can be found in our recently published protocol ( 11 ).

Enhanced functional analysis for untargeted metabolomics

Functional analysis of untargeted metabolomics was initially established based on mummichog and Gene Set Enrichment Analysis (GSEA) since MetaboAnalyst 4.0 ( 58 ). It was further enhanced in MetaboAnalyst 5.0 by incorporating retention time into calculating empirical compounds. MetaboAnalyst 6.0 now allows users to upload an LC–MS peak list along with a corresponding MS2-based compound list to filter out unrealistic empirical compounds to further improve the accuracy in functional analysis ( 59 ).

Enhanced data visualization support

We have enhanced the quality of the interactive and synchronized 3D plots across the dimensionality reduction methods (PCA, PLS-DA, sPLS-DA) used in MetaboAnalyst based on the powerful three.js library ( https://threejs.org/ ). New features include customizable backgrounds, data point annotations and confidence ellipsoids (Figure 2A ). We have also implemented interactive plots for clustering heatmaps in the Statistical Analysis modules to better support visual exploration of large data matrices typical in untargeted metabolomics. Both mouse-over and zoom-in functionalities are supported to allow users to examine specific features or patterns of interest. In addition to these enhancements, we also updated the visualization for KEGG’s global metabolic network ( 43 ).

To illustrate the utility of the new features of MetaboAnalyst 6.0, we used a metabolomics dataset collected in-house that aimed at studying glucose-induced insulin secretion in isolated human islets. The dataset contains five samples of high-glucose (16.7 mM) exposures, five samples of low-glucose (2.8 mM) exposures, both for 30 min, and five quality control (QC) samples. The LC-MS spectra were collected using our Q-Exactive Orbitrap platform (Thermo Scientific, Waltham, MA USA), together with three SWATH-DIA acquisitions from the pooled QC. The spectra were first centroided and converted into mzML format using ProteoWizard ( 60 , 61 ) and uploaded to MetaboAnalyst 6.0. LC–MS spectra processing was performed using the asari algorithm. All detected MS1 features were used as a target list for MS2 deconvolution and database searching. A total of 27 209 MS1 features were detected, with 4959 of them identified with at least one potential named chemical identity. Functional analysis using the mummichog algorithm indicated compounds showing significant changes between the high-glucose and low-glucose groups were involved in the C arnitine shuttle , C affeine, Tryptophan , and C oenzyme A metabolism pathways (Figure 2C ). These pathways have been consistently identified in previous studies ( 62–65 ). Finally, we performed a causal analysis on the associations between one of the significant metabolites identified, L-Cystathionine and type 2 diabetes (GWAS ID: finn-b-E4_DM2). The default parameters were used for both SNP filtering and harmonization, as well as MR analysis. Based on these results, a significantly altered cystathionine level was found to have a causal effect on type 2 diabetes (Figure 2D ), which aligns well with a study published recently ( 66 ). This case study highlights how MetaboAnalyst 6.0 allows users to investigate the chemical identities of MS peaks, elucidate associations between metabolites and phenotypes to unveil previously unknown functional insights. To showcase the dose-response analysis module, we utilized a published data collected from BT549 breast cancer cells treated with four different doses of etomoxir ( 21 ). Figure 2E summarizes the results from dose-response modeling. Figure 2F shows an example feature-level BMD calculated based on the fitted curve. The workflow is included as a series of tutorials on our website.

Several web-based tools have been developed to address various aspects of metabolomics data processing, statistical analysis, functional interpretation, and results visualization. Table 1 compares the main features of MetaboAnalyst 6.0 with other popular tools including the previous version, XCMS online ( 23 ), GNPS ( 45 ), Workflow4Metabolomics (W4M) ( 67 ) and MetExplore ( 68 ). For raw data processing, MetaboAnalyst primarily focuses on supporting LC–MS data, whereas W4M also supports GC–MS and NMR raw data processing, and GNPS emphasizes MS2-based compound identification via molecular networks. In comparison, MetaboAnalyst provides an auto-optimized workflow along with an additional algorithm (asari) for efficient LC–MS spectra processing, together with more extensive MS2 spectra libraries for compound identification. In terms of statistical analysis, MetaboAnalyst 6.0 has introduced new modules for dealing with complex metadata, causal analysis and dose–response analysis, while maintaining all other functionalities. MetaboAnalyst contains unique features for enrichment and pathway analysis, and these strengths were further improved in version 6.0, with the addition of unique functions and supports for more species. For network analysis and integration, MetExplore specializes in metabolic network visualization and integration with other omics. These features are addressed by our companion tool, OmicsNet ( 69 ). Overall, MetaboAnalyst 6.0 continues to be the most comprehensive tool for metabolomics data processing, analysis and interpretation.

Comparison of MetaboAnalyst 6.0 with its previous version and other common web-based metabolomics tools. Symbols used for feature evaluations with ‘√’ for present, ‘-’ for absent, and ‘+’ for a more quantitative assessment (more ‘+’ indicate better support)

• XCMS online: https://xcmsonline.scripps.edu/ .

• GNPS: https://gnps.ucsd.edu/ .

• Workflow4Metabolomics (W4M): https://workflow4metabolomics.org/ .

• MetExplore: https://metexplore.toulouse.inra.fr/metexplore2/

By incorporating a new MS2 data processing workflow, MetaboAnalyst 6.0 now offers a web-based, end-to-end platform for metabolomics data analysis. The workflow spans from raw MS spectra processing to compound identification to functional analysis. A key motivation in developing version 6.0 was to support the data analysis needs emerging from exposomics and follow-up validation studies. The new statistical analysis module specifically takes into account of complex metadata to better identify robust associations. From these associations, users can perform causal analysis based on 2SMR to narrow down candidate compounds. The remaining compounds can be validated through dose-response studies based on in vitro or animal models. Our case study highlights the streamlined analysis workflow from raw spectra processing to compound annotation, to functional interpretation, and finally to causal insights. In conclusion, MetaboAnalyst 6.0 is a user-friendly platform for comprehensive analysis of metabolomics data and help address emerging needs from recent exposomics research. For future directions, we will continue to improve metabolome annotations, better integrate with other omics data, and explore new ways to interact with users via generative artificial intelligence technologies ( 70–73 ).

MetaboAnalyst 6.0 is freely available at https://www.metaboanalyst.ca . No log in required.

Human islets for research were provided by the Alberta Diabetes Institute IsletCore at the University of Alberta in Edmonton with the assistance of the Human Organ Procurement and Exchange (HOPE) program, Trillium Gift of Life Network (TGLN), and other Canadian organ procurement organizations. Islet isolation was approved by the Human Research Ethics Board at the University of Alberta (Pro00013094). All donors’ families gave informed consent for the use of pancreatic tissue in research.

This research was funded by Genome Canada, Canadian Foundation for Innovation (CFI), US National Institutes of Health (U01 CA235493), Canadian Institutes of Health Research (CIHR), Juvenile Diabetes Research Foundation (JDRF), Natural Sciences and Engineering Research Council of Canada (NSERC), and Diabetes Canada. Funding for open access charge: NSERC.

Conflict of interest statement . J. Xia is the founder of XiaLab Analytics.

Lloyd-Price J. , Arze C. , Ananthakrishnan A.N. , Schirmer M. , Avila-Pacheco J. , Poon T.W. , Andrews E. , Ajami N.J. , Bonham K.S. , Brislawn C.J. et al. . Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases . Nature . 2019 ; 569 : 655 – 662 .

Google Scholar

Utpott M. , Rodrigues E. , Rios A.O. , Mercali G.D. , Flores S.H. Metabolomics: an analytical technique for food processing evaluation . Food Chem. 2022 ; 366 : 130685 .

Wishart D.S. Metabolomics for investigating physiological and pathophysiological processes . Physiol. Rev. 2019 ; 99 : 1819 – 1875 .

Vermeulen R. , Schymanski E.L. , Barabasi A.L. , Miller G.W. The exposome and health: where chemistry meets biology . Science . 2020 ; 367 : 392 – 396 .

Danzi F. , Pacchiana R. , Mafficini A. , Scupoli M.T. , Scarpa A. , Donadelli M. , Fiore A. To metabolomics and beyond: a technological portfolio to investigate cancer metabolism . Signal. Transduct. Target. Ther. 2023 ; 8 : 137 .

Xia J. , Psychogios N. , Young N. , Wishart D.S. MetaboAnalyst: a web server for metabolomic data analysis and interpretation . Nucleic Acids Res. 2009 ; 37 : W652 – W660 .

Xia J. , Mandal R. , Sinelnikov I.V. , Broadhurst D. , Wishart D.S. MetaboAnalyst 2.0—A comprehensive server for metabolomic data analysis . Nucleic Acids Res. 2012 ; 40 : W127 – W133 .

Xia J. , Sinelnikov I.V. , Han B. , Wishart D.S. MetaboAnalyst 3.0—Making metabolomics more meaningful . Nucleic Acids Res. 2015 ; 43 : W251 – W257 .

Chong J. , Soufan O. , Li C. , Caraus I. , Li S. , Bourque G. , Wishart D.S. , Xia J. MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis . Nucleic Acids Res. 2018 ; 46 : W486 – W494 .

Pang Z. , Chong J. , Zhou G. , de Lima Morais D.A. , Chang L. , Barrette M. , Gauthier C. , Jacques P.-É. , Li S. , Xia J. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights . Nucleic Acids Res. 2021 ; 49 : W388 – W396 .

Pang Z. , Zhou G. , Ewald J. , Chang L. , Hacariz O. , Basu N. , Xia J. Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data . Nat. Protoc. 2022 ; 17 : 1735 – 1761 .

Frigerio G. , Moruzzi C. , Mercadante R. , Schymanski E.L. , Fustinoni S. Development and application of an LC–MS/MS untargeted exposomics method with a separated pooled quality control strategy . Molecules . 2022 ; 27 : 2580 .

Tsugawa H. , Cajka T. , Kind T. , Ma Y. , Higgins B. , Ikeda K. , Kanazawa M. , VanderGheynst J. , Fiehn O. , Arita M. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis . Nat. Methods . 2015 ; 12 : 523 – 526 .

Chang L. , Zhou G. , Xia J. mGWAS-Explorer 2.0: causal analysis and interpretation of metabolite-phenotype associations . Metabolites . 2023 ; 13 : 826 .

Shin S.Y. , Fauman E.B. , Petersen A.K. , Krumsiek J. , Santos R. , Huang J. , Arnold M. , Erte I. , Forgetta V. , Yang T.P. et al. . An atlas of genetic influences on human blood metabolites . Nat. Genet. 2014 ; 46 : 543 – 550 .

Chen Y. , Lu T. , Pettersson-Kymmer U. , Stewart I.D. , Butler-Laporte G. , Nakanishi T. , Cerani A. , Liang K.Y.H. , Yoshiji S. , Willett J.D.S. et al. . Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases . Nat. Genet. 2023 ; 55 : 44 – 53 .

Hemani G. , Zheng J. , Elsworth B. , Wade K.H. , Haberland V. , Baird D. , Laurin C. , Burgess S. , Bowden J. , Langdon R. et al. . The MR-Base platform supports systematic causal inference across the human phenome . eLife . 2018 ; 7 : e34408 .

Sanderson E. , Glymour M.M. , Holmes M.V. , Kang H. , Morrison J. , Munafò M.R. , Palmer T. , Schooling C.M. , Wallace C. , Zhao Q. et al. . Mendelian randomization . Nat. Rev. Methods Primers . 2022 ; 2 : 6 .

Zhao H. , Liu M. , Lv Y. , Fang M. Dose-response metabolomics and pathway sensitivity to map molecular cartography of bisphenol A exposure . Environ. Int. 2022 ; 158 : 106893 .

Thomas R.S. , Wesselkamper S.C. , Wang N.C.Y. , Zhao Q.J. , Petersen D.D. , Lambert J.C. , Cote I. , Yang L. , Healy E. , Black M.B. et al. . Temporal concordance between apical and transcriptional points of departure for chemical risk assessment . Toxicol. Sci. 2013 ; 134 : 180 – 194 .

Yao C.-H. , Wang L. , Stancliffe E. , Sindelar M. , Cho K. , Yin W. , Wang Y. , Patti G.J. Dose-response metabolomics to understand biochemical mechanisms and off-target drug effects with the TOXcms software . Anal. Chem. 2020 ; 92 : 1856 – 1864 .

Li S. , Siddiqa A. , Thapa M. , Chi Y. , Zheng S. Trackable and scalable LC–MS metabolomics data processing using asari . Nat. Commun. 2023 ; 14 : 4113 .

Tautenhahn R. , Patti G.J. , Rinehart D. , Siuzdak G. XCMS Online: a web-based platform to process untargeted metabolomic data . Anal. Chem. 2012 ; 84 : 5035 – 5039 .

Schmid R. , Heuckeroth S. , Korf A. , Smirnov A. , Myers O. , Dyrlund T.S. , Bushuiev R. , Murray K.J. , Hoffmann N. , Lu M. et al. . Integrative analysis of multimodal mass spectrometry data in MZmine 3 . Nat. Biotechnol. 2023 ; 41 : 447 – 449 .

Myers O.D. , Sumner S.J. , Li S. , Barnes S. , Du X. Detailed investigation and comparison of the XCMS and MZmine 2 chromatogram construction and chromatographic peak detection methods for preprocessing mass spectrometry metabolomics data . Anal. Chem. 2017 ; 89 : 8689 – 8695 .

Pang Z. , Chong J. , Li S. , Xia J. MetaboAnalystR 3.0: toward an optimized workflow for global metabolomics . Metabolites . 2020 ; 10 : 186 .

Xing S. , Yu H. , Liu M. , Jia Q. , Sun Z. , Fang M. , Huan T. Recognizing contamination fragment ions in liquid chromatography–Tandem mass spectrometry data . J. Am. Soc. Mass. Spectrom. 2021 ; 32 : 2296 – 2305 .

Stancliffe E. , Schwaiger-Haber M. , Sindelar M. , Patti G.J. DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution . Nat. Methods . 2021 ; 18 : 779 – 787 .

Yin Y. , Wang R. , Cai Y. , Wang Z. , Zhu Z.-J. DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics . Anal. Chem. 2019 ; 91 : 11897 – 11904 .

Li Y. , Kind T. , Folz J. , Vaniya A. , Mehta S.S. , Fiehn O. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification . Nat. Methods . 2021 ; 18 : 1524 – 1531 .

Xing S. , Shen S. , Xu B. , Li X. , Huan T. BUDDY: molecular formula discovery via bottom-up MS/MS interrogation . Nat. Methods . 2023 ; 20 : 881 – 890 .

Sollis E. , Mosaku A. , Abid A. , Buniello A. , Cerezo M. , Gil L. , Groza T. , Gunes O. , Hall P. , Hayhurst J. et al. . The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource . Nucleic Acids Res. 2023 ; 51 : D977 – D985 .

Hemani G. , Tilling K. , Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data . PLos Genet. 2017 ; 13 : e1007081 .

Marees A.T. , de Kluiver H. , Stringer S. , Vorspan F. , Curis E. , Marie-Claire C. , Derks E.M. A tutorial on conducting genome-wide association studies: quality control and statistical analysis . Int. J. Methods Psychiatr. Res. 2018 ; 27 : e1608 .

de Leeuw C. , Savage J. , Bucur I.G. , Heskes T. , Posthuma D Understanding the assumptions underlying mendelian randomization . Eur. J. Hum. Genet. 2022 ; 30 : 653 – 660 .

Altshuler B. Modeling of dose-response relationships . Environ. Health Perspect. 1981 ; 42 : 23 – 27 .

Thomas R.S. , Wesselkamper S.C. , Wang N.C. , Zhao Q.J. , Petersen D.D. , Lambert J.C. , Cote I. , Yang L. , Healy E. , Black M.B. et al. . Temporal concordance between apical and transcriptional points of departure for chemical risk assessment . Toxicol. Sci. 2013 ; 134 : 180 – 194 .

Kleensang A. , Maertens A. , Rosenberg M. , Fitzpatrick S. , Lamb J. , Auerbach S. , Brennan R. , Crofton K.M. , Gordon B. , Fornace A.J. Jr et al. . Pathways of toxicity . ALTEX . 2014 ; 31 : 53 – 61 .

Ewald J. , Soufan O. , Xia J. , Basu N. FastBMD: an online tool for rapid benchmark dose–response analysis of transcriptomics data . Bioinformatics . 2020 ; 37 : 1035 – 1036 .

Ewald J. , Zhou G. , Lu Y. , Xia J. Using ExpressAnalyst for comprehensive gene expression analysis in model and non-model organisms . Curr Protoc . 2023 ; 3 : e922 .

Wishart D.S. , Guo A. , Oler E. , Wang F. , Anjum A. , Peters H. , Dizon R. , Sayeeda Z. , Tian S. , Lee B.L. et al. . HMDB 5.0: the Human Metabolome Database for 2022 . Nucleic Acids Res. 2021 ; 50 : D622 – D631 .

Kanehisa M. , Furumichi M. , Sato Y. , Kawashima M. , Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes . Nucleic Acids Res. 2022 ; 51 : D587 – D592 .

Kim S. Exploring chemical information in PubChem . Curr. Protoc. 2021 ; 1 : e217 .

Aron A.T. , Gentry E.C. , McPhail K.L. , Nothias L.-F. , Nothias-Esposito M. , Bouslimani A. , Petras D. , Gauglitz J.M. , Sikora N. , Vargas F. et al. . Reproducible molecular networking of untargeted mass spectrometry data using GNPS . Nat. Protoc. 2020 ; 15 : 1954 – 1991 .

Horai H. , Arita M. , Kanaya S. , Nihei Y. , Ikeda T. , Suwa K. , Ojima Y. , Tanaka K. , Tanaka S. , Aoshima K. et al. . MassBank: a public repository for sharing mass spectral data for life sciences . J. Mass Spectrom. 2010 ; 45 : 703 – 714 .

Jeffryes J.G. , Colastani R.L. , Elbadawi-Sidhu M. , Kind T. , Niehaus T.D. , Broadbelt L.J. , Hanson A.D. , Fiehn O. , Tyo K.E. , Henry C.S. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics . J Cheminform . 2015 ; 7 : 44 .

Kind T. , Liu K.-H. , Lee D.Y. , DeFelice B. , Meissen J.K. , Fiehn O. LipidBlast in silico tandem mass spectrometry database for lipid identification . Nat. Methods . 2013 ; 10 : 755 – 758 .

Tsugawa H. , Nakabayashi R. , Mori T. , Yamada Y. , Takahashi M. , Rai A. , Sugiyama R. , Yamamoto H. , Nakaya T. , Yamazaki M. et al. . A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms . Nat. Methods . 2019 ; 16 : 295 – 298 .

Sawada Y. , Nakabayashi R. , Yamada Y. , Suzuki M. , Sato M. , Sakata A. , Akiyama K. , Sakurai T. , Matsuda F. , Aoki T. et al. . RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database . Phytochemistry . 2012 ; 82 : 38 – 45 .

Lee S. , Hwang S. , Seo M. , Shin K.B. , Kim K.H. , Park G.W. , Kim J.Y. , Yoo J.S. , No K.T. BMDMS-NP: a comprehensive ESI-MS/MS spectral library of natural compounds . Phytochemistry . 2020 ; 177 : 112427 .

Tsugawa H. , Ikeda K. , Takahashi M. , Satoh A. , Mori Y. , Uchino H. , Okahashi N. , Yamada Y. , Tada I. , Bonini P. et al. . A lipidome atlas in MS-DIAL 4 . Nat. Biotechnol. 2020 ; 38 : 1159 – 1163 .

Aisporna A. , Benton H.P. , Chen A. , Derks R.J.E. , Galano J.M. , Giera M. , Siuzdak G. Neutral loss mass spectral data enhances molecular similarity analysis in METLIN . J. Am. Soc. Mass. Spectrom. 2022 ; 33 : 530 – 534 .

Wishart D.S. , Bartok B. , Oler E. , Liang K.Y.H. , Budinski Z. , Berjanskii M. , Guo A. , Cao X. , Wilson M. MarkerDB: an online database of molecular biomarkers . Nucleic Acids Res. 2021 ; 49 : D1259 – D1267 .

Braisted J. , Patt A. , Tindall C. , Sheils T. , Neyra J. , Spencer K. , Eicher T. , Mathé E.A. RaMP-DB 2.0: a renovated knowledgebase for deriving biological and chemical insight from metabolites, proteins, and genes . Bioinformatics . 2023 ; 39 : btac726 .

Smilde A.K. , Jansen J.J. , Hoefsloot H.C. , Lamers R.J. , van der Greef J. , Timmerman M.E. ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data . Bioinformatics . 2005 ; 21 : 3043 – 3048 .

Ritchie M.E. , Phipson B. , Wu D. , Hu Y. , Law C.W. , Shi W. , Smyth G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies . Nucleic Acids Res. 2015 ; 43 : e47 .

Li S. , Park Y. , Duraisingham S. , Strobel F.H. , Khan N. , Soltow Q.A. , Jones D.P. , Pulendran B. Predicting network activity from high throughput metabolomics . PLoS Comput. Biol. 2013 ; 9 : e1003123 .

Lu Y. , Pang Z. , Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC–MS global metabolomics data . Brief. Bioinform. 2023 ; 24 : bbac553 .

Chambers M.C. , Maclean B. , Burke R. , Amodei D. , Ruderman D.L. , Neumann S. , Gatto L. , Fischer B. , Pratt B. , Egertson J. et al. . A cross-platform toolkit for mass spectrometry and proteomics . Nat. Biotechnol. 2012 ; 30 : 918 – 920 .

Adusumilli R. , Mallick P. Data conversion with ProteoWizard msConvert . Methods Mol. Biol. 2017 ; 1550 : 339 – 368 .

Bene J. , Hadzsiev K. , Melegh B. Role of carnitine and its derivatives in the development and management of type 2 diabetes . Nutr Diabetes . 2018 ; 8 : 8 .

Lane J.D. , Barkauskas C.E. , Surwit R.S. , Feinglos M.N. Caffeine impairs glucose metabolism in type 2 diabetes . Diabetes Care. 2004 ; 27 : 2047 – 2048 .

Unluturk U. , Erbas T. Engin A. , Engin A.B. Tryptophan Metabolism: Implications for Biological Processes, Health and Disease . 2015 ; Cham Springer International Publishing 147 – 171 .

Google Preview

Jackowski S. , Leonardi R. Deregulated coenzyme A, loss of metabolic flexibility and diabetes . Biochem. Soc. Trans. 2014 ; 42 : 1118 – 1122 .

Cruciani-Guglielmacci C. , Meneyrol K. , Denom J. , Kassis N. , Rachdi L. , Makaci F. , Migrenne-Li S. , Daubigney F. , Georgiadou E. , Denis R.G. et al. . Homocysteine metabolism pathway is involved in the control of glucose homeostasis: a cystathionine beta synthase deficiency study in mouse . Cells . 2022 ; 11 : 1737 .

Giacomoni F. , Le Corguillé G. , Monsoor M. , Landi M. , Pericard P. , Pétéra M. , Duperier C. , Tremblay-Franco M. , Martin J.-F. , Jacob D. et al. . Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics . Bioinformatics . 2014 ; 31 : 1493 – 1495 .

Cottret L. , Frainay C. , Chazalviel M. , Cabanettes F. , Gloaguen Y. , Camenen E. , Merlet B. , Heux S. , Portais J.C. , Poupin N. et al. . MetExplore: collaborative edition and exploration of metabolic networks . Nucleic Acids Res. 2018 ; 46 : W495 – W502 .

Zhou G. , Pang Z. , Lu Y. , Ewald J. , Xia J. OmicsNet 2.0: a web-based platform for multi-omics integration and network visual analytics . Nucleic Acids Res. 2022 ; 50 : W527 – W533 .

Lu Y. , Zhou G. , Ewald J. , Pang Z. , Shiri T. , Xia J. MicrobiomeAnalyst 2.0: comprehensive statistical, functional and integrative analysis of microbiome data . Nucleic Acids Res. 2023 ; 51 : W310 – W318 .

Liu P. , Ewald J. , Pang Z. , Legrand E. , Jeon Y.S. , Sangiovanni J. , Hacariz O. , Zhou G. , Head J.A. , Basu N. et al. . ExpressAnalyst: a unified platform for RNA-sequencing analysis in non-model species . Nat. Commun. 2023 ; 14 : 2995 .

Zhou G. , Ewald J. , Xia J. OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data . Nucleic Acids Res. 2021 ; 49 : W476 – W482 .

Moor M. , Banerjee O. , Abad Z.S.H. , Krumholz H.M. , Leskovec J. , Topol E.J. , Rajpurkar P. Foundation models for generalist medical artificial intelligence . Nature . 2023 ; 616 : 259 – 265 .

Email alerts

Citing articles via.

Editorial Board

Affiliations

Online ISSN 1362-4962
Print ISSN 0305-1048
About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Open access
Institutional account management
Rights and permissions
Get help with access
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Share full article

Supported by

What Researchers Discovered When They Sent 80,000 Fake Résumés to U.S. Jobs

Some companies discriminated against Black applicants much more than others, and H.R. practices made a big difference.

By Claire Cain Miller and Josh Katz

A group of economists recently performed an experiment on around 100 of the largest companies in the country, applying for jobs using made-up résumés with equivalent qualifications but different personal characteristics. They changed applicants’ names to suggest that they were white or Black, and male or female — Latisha or Amy, Lamar or Adam.

On Monday, they released the names of the companies . On average, they found, employers contacted the presumed white applicants 9.5 percent more often than the presumed Black applicants.

Yet this practice varied significantly by firm and industry. One-fifth of the companies — many of them retailers or car dealers — were responsible for nearly half of the gap in callbacks to white and Black applicants.

Two companies favored white applicants over Black applicants significantly more than others. They were AutoNation, a used car retailer, which contacted presumed white applicants 43 percent more often, and Genuine Parts Company, which sells auto parts including under the NAPA brand, and called presumed white candidates 33 percent more often.

In a statement, Heather Ross, a spokeswoman for Genuine Parts, said, “We are always evaluating our practices to ensure inclusivity and break down barriers, and we will continue to do so.” AutoNation did not respond to a request for comment.

Companies With the Largest and Smallest Racial Contact Gaps

Of the 97 companies in the experiment, two stood out as contacting presumed white job applicants significantly more often than presumed Black ones. At 14 companies, there was little or no difference in how often they called back the presumed white or Black applicants.

Source: Patrick Kline, Evan K. Rose and Christopher R. Walters

Known as an audit study , the experiment was the largest of its kind in the United States: The researchers sent 80,000 résumés to 10,000 jobs from 2019 to 2021. The results demonstrate how entrenched employment discrimination is in parts of the U.S. labor market — and the extent to which Black workers start behind in certain industries.

“I am not in the least bit surprised,” said Daiquiri Steele, an assistant professor at the University of Alabama School of Law who previously worked for the Department of Labor on employment discrimination. “If you’re having trouble breaking in, the biggest issue is the ripple effect it has. It affects your wages and the economy of your community going forward.”

Some companies showed no difference in how they treated applications from people assumed to be white or Black. Their human resources practices — and one policy in particular (more on that later) — offer guidance for how companies can avoid biased decisions in the hiring process.

A lack of racial bias was more common in certain industries: food stores, including Kroger; food products, including Mondelez; freight and transport, including FedEx and Ryder; and wholesale, including Sysco and McLane Company.

“We want to bring people’s attention not only to the fact that racism is real, sexism is real, some are discriminating, but also that it’s possible to do better, and there’s something to be learned from those that have been doing a good job,” said Patrick Kline, an economist at the University of California, Berkeley, who conducted the study with Evan K. Rose at the University of Chicago and Christopher R. Walters at Berkeley.

The researchers first published details of their experiment in 2021, but without naming the companies. The new paper, which is set to run in the American Economic Review, names the companies and explains the methodology developed to group them by their performance, while accounting for statistical noise.

Sample Résumés From the Experiment

Fictitious résumés sent to large U.S. companies revealed a preference, on average, for candidates whose names suggested that they were white.

To assign names, the researchers started with a prior list that had been assembled using Massachusetts birth certificates from 1974 to 1979. They then supplemented this list with names found in a database of speeding tickets issued in North Carolina between 2006 and 2018, classifying a name as “distinctive” if more than 90 percent of people with that name were of a particular race.

The study includes 97 firms. The jobs the researchers applied to were entry level, not requiring a college degree or substantial work experience. In addition to race and gender, the researchers tested other characteristics protected by law , like age and sexual orientation.

They sent up to 1,000 applications to each company, applying for as many as 125 jobs per company in locations nationwide, to try to uncover patterns in companies’ operations versus isolated instances. Then they tracked whether the employer contacted the applicant within 30 days.

A bias against Black names

Companies requiring lots of interaction with customers, like sales and retail, particularly in the auto sector, were most likely to show a preference for applicants presumed to be white. This was true even when applying for positions at those firms that didn’t involve customer interaction, suggesting that discriminatory practices were baked in to corporate culture or H.R. practices, the researchers said.

Still, there were exceptions — some of the companies exhibiting the least bias were retailers, like Lowe’s and Target.

The study may underestimate the rate of discrimination against Black applicants in the labor market as a whole because it tested large companies, which tend to discriminate less, said Lincoln Quillian, a sociologist at Northwestern who analyzes audit studies. It did not include names intended to represent Latino or Asian American applicants, but other research suggests that they are also contacted less than white applicants, though they face less discrimination than Black applicants.

The experiment ended in 2021, and some of the companies involved might have changed their practices since. Still, a review of all available audit studies found that discrimination against Black applicants had not changed in three decades. After the Black Lives Matter protests in 2020, such discrimination was found to have disappeared among certain employers, but the researchers behind that study said the effect was most likely short-lived.

Gender, age and L.G.B.T.Q. status

On average, companies did not treat male and female applicants differently. This aligns with other research showing that gender discrimination against women is rare in entry-level jobs, and starts later in careers.

However, when companies did favor men (especially in manufacturing) or women (mostly at apparel stores), the biases were much larger than for race. Builders FirstSource contacted presumed male applicants more than twice as often as female ones. Ascena, which owns brands like Ann Taylor, contacted women 66 percent more than men.

Neither company responded to requests for comment.

The consequences of being female differed by race. The differences were small, but being female was a slight benefit for white applicants, and a slight penalty for Black applicants.

The researchers also tested several other characteristics protected by law, with a smaller number of résumés. They found there was a small penalty for being over 40.

Overall, they found no penalty for using nonbinary pronouns. Being gay, as indicated by including membership in an L.G.B.T.Q. club on the résumé, resulted in a slight penalty for white applicants, but benefited Black applicants — although the effect was small, when this was on their résumés, the racial penalty disappeared.

Under the Civil Rights Act of 1964, discrimination is illegal even if it’s unintentional . Yet in the real world, it is difficult for job applicants to know why they did not hear back from a company.

“These practices are particularly challenging to address because applicants often do not know whether they are being discriminated against in the hiring process,” Brandalyn Bickner, a spokeswoman for the Equal Employment Opportunity Commission, said in a statement. (It has seen the data and spoken with the researchers, though it could not use an academic study as the basis for an investigation, she said.)

What companies can do to reduce discrimination

Several common measures — like employing a chief diversity officer, offering diversity training or having a diverse board — were not correlated with decreased discrimination in entry-level hiring, the researchers found.

But one thing strongly predicted less discrimination: a centralized H.R. operation.

The researchers recorded the voice mail messages that the fake applicants received. When a company’s calls came from fewer individual phone numbers, suggesting that they were originating from a central office, there tended to be less bias . When they came from individual hiring managers at local stores or warehouses, there was more. These messages often sounded frantic and informal, asking if an applicant could start the next day, for example.

“That’s when implicit biases kick in,” Professor Kline said. A more formalized hiring process helps overcome this, he said: “Just thinking about things, which steps to take, having to run something by someone for approval, can be quite important in mitigating bias.”

At Sysco, a wholesale restaurant food distributor, which showed no racial bias in the study, a centralized recruitment team reviews résumés and decides whom to call. “Consistency in how we review candidates, with a focus on the requirements of the position, is key,” said Ron Phillips, Sysco’s chief human resources officer. “It lessens the opportunity for personal viewpoints to rise in the process.”

Another important factor is diversity among the people hiring, said Paula Hubbard, the chief human resources officer at McLane Company. It procures, stores and delivers products for large chains like Walmart, and showed no racial bias in the study. Around 40 percent of the company’s recruiters are people of color, and 60 percent are women.

Diversifying the pool of people who apply also helps, H.R. officials said. McLane goes to events for women in trucking and puts up billboards in Spanish.

So does hiring based on skills, versus degrees . While McLane used to require a college degree for many roles, it changed that practice after determining that specific skills mattered more for warehousing or driving jobs. “We now do that for all our jobs: Is there truly a degree required?” Ms. Hubbard said. “Why? Does it make sense? Is experience enough?”

Hilton, another company that showed no racial bias in the study, also stopped requiring degrees for many jobs, in 2018.

Another factor associated with less bias in hiring, the new study found, was more regulatory scrutiny — like at federal contractors, or companies with more Labor Department citations.

Finally, more profitable companies were less biased, in line with a long-held economics theory by the Nobel Prize winner Gary Becker that discrimination is bad for business. Economists said that could be because the more profitable companies benefit from a more diverse set of employees. Or it could be an indication that they had more efficient business processes, in H.R. and elsewhere.

Claire Cain Miller writes about gender, families and the future of work for The Upshot. She joined The Times in 2008 and was part of a team that won a Pulitzer Prize in 2018 for public service for reporting on workplace sexual harassment issues. More about Claire Cain Miller

Josh Katz is a graphics editor for The Upshot, where he covers a range of topics involving politics, policy and culture. He is the author of “Speaking American: How Y’all, Youse, and You Guys Talk,” a visual exploration of American regional dialects. More about Josh Katz

From The Upshot: What the Data Says

Analysis that explains politics, policy and everyday life..

Employment Discrimination: Researchers sent 80,000 fake résumés to some of the largest companies in the United States. They found that some discriminated against Black applicants much more than others .

Pandemic School Closures: A variety of data about children’s academic outcomes and about the spread of Covid-19 has accumulated since the start of the pandemic. Here is what we learned from it .

Affirmative Action: The Supreme Court effectively ended race-based preferences in admissions. But will selective schools still be able to achieve diverse student bodies? Here is how they might try .

N.Y.C. Neighborhoods: We asked New Yorkers to map their neighborhoods and to tell us what they call them . The result, while imperfect, is an extremely detailed map of the city .

Dialect Quiz: What does the way you speak say about where you’re from? Answer these questions to find out .

ORIGINAL RESEARCH article

This article is part of the research topic.

Mendelian Randomization and Cardiovascular Diseases

No causal association between the volume of strenuous exercise and coronary atherosclerosis: A two-sample Mendelian randomization study Provisionally Accepted

1 Nanfang Hospital, Southern Medical University, China
2 School of Traditional Chinese Medicine, Southern Medical University, China
3 Second Clinical Medical College, Guangzhou University of Traditional Chinese Medicine, China

The final, formatted version of the article will be published soon.

Objective: Several observational studies have shown that high-volume and high-intensity exercise training increases the prevalence and severity of coronary atherosclerosis, but the causal effect still remains uncertain. This study aims to explore the causal relationship between the volume of strenuous exercise (SE) and coronary atherosclerosis (CA) using the Mendelian randomization (MR) method.The exposure factors were two basic parameters of the volume of strenuous exercise (duration and frequency of strenuous exercise), the outcome factor was coronary atherosclerosis, and the relevant genetic loci were extracted from the summary data of the genome-wide association study (GWAS) as the instrumental variables, and MR analyses were performed using the inverse variance weighting (IVW) method, the weighted median method, and the MR-egger method. Sensitivity analyses were performed using heterogeneity analysis, pleiotropy analysis, and the "leave-one-out" method. The original results were tested using other coronary atherosclerosis data sets.Result: IVW results showed no causal association between duration of strenuous exercise (DOSE) [OR=0.9937, 95% CI (0.9847, 1.0028), P=0.1757] and frequency of strenuous exercise (FOSE) in the last 4 weeks [OR=0.9930, 95% CI (0.9808, 1.0054), P=0.2660] and coronary atherosclerosis. All of the above results were validated with other coronary atherosclerosis data sets.The present study supports that the causal association of duration and frequency of SE with CA was not found, and provides valuable insights into the choice of scientific and correct volume of SE to cardiac rehabilitation (CR).

Keywords: Mendelian randomization, the volume of strenuous exercise, Coronary atherosclerosis, Cardiac Rehabilitation, Genome-Wide Association Study, High-intensity interval training

Received: 26 Nov 2023; Accepted: 11 Apr 2024.

Copyright: © 2024 Xiao, Huang, Li, Wang, Zheng, Li, Gong, Lv and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Jingjun Li, Nanfang Hospital, Southern Medical University, Guangzhou, 510515, Guangdong Province, China

People also looked at

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base
Working with sources
How to Paraphrase | Step-by-Step Guide & Examples

How to Paraphrase | Step-by-Step Guide & Examples

Published on April 8, 2022 by Courtney Gahan and Jack Caulfield. Revised on June 1, 2023.

Paraphrasing means putting someone else’s ideas into your own words. Paraphrasing a source involves changing the wording while preserving the original meaning.

Paraphrasing is an alternative to quoting (copying someone’s exact words and putting them in quotation marks ). In academic writing, it’s usually better to integrate sources by paraphrasing instead of quoting. It shows that you have understood the source, reads more smoothly, and keeps your own voice front and center.

Every time you paraphrase, it’s important to cite the source . Also take care not to use wording that is too similar to the original. Otherwise, you could be at risk of committing plagiarism .

What is your plagiarism score?

Compare your paper with 99.3 billion webpages and 8 million publications.

Best plagiarism checker of 2021
Plagiarism report & percentage
Largest plagiarism database

Scribbr Plagiarism Checker

How to paraphrase in five easy steps, how to paraphrase correctly, examples of paraphrasing, how to cite a paraphrase, paraphrasing vs. quoting, paraphrasing vs. summarizing, avoiding plagiarism when you paraphrase, other interesting articles, frequently asked questions about paraphrasing.

If you’re struggling to get to grips with the process of paraphrasing, check out our easy step-by-step guide in the video below.

Prevent plagiarism. Run a free check.

Putting an idea into your own words can be easier said than done. Let’s say you want to paraphrase the text below, about population decline in a particular species of sea snails.

Incorrect paraphrasing

You might make a first attempt to paraphrase it by swapping out a few words for synonyms .

Like other sea creatures inhabiting the vicinity of highly populated coasts, horse conchs have lost substantial territory to advancement and contamination , including preferred breeding grounds along mud flats and seagrass beds. Their Gulf home is also heating up due to global warming , which scientists think further puts pressure on the creatures , predicated upon the harmful effects extra warmth has on other large mollusks (Barnett, 2022).

This attempt at paraphrasing doesn’t change the sentence structure or order of information, only some of the word choices. And the synonyms chosen are poor:

“Advancement and contamination” doesn’t really convey the same meaning as “development and pollution.”
Sometimes the changes make the tone less academic: “home” for “habitat” and “sea creatures” for “marine animals.”
Adding phrases like “inhabiting the vicinity of” and “puts pressure on” makes the text needlessly long-winded.
Global warming is related to climate change, but they don’t mean exactly the same thing.

Because of this, the text reads awkwardly, is longer than it needs to be, and remains too close to the original phrasing. This means you risk being accused of plagiarism .

Correct paraphrasing

Let’s look at a more effective way of paraphrasing the same text.

Here, we’ve:

Only included the information that’s relevant to our argument (note that the paraphrase is shorter than the original)
Introduced the information with the signal phrase “Scientists believe that …”
Retained key terms like “development and pollution,” since changing them could alter the meaning
Structured sentences in our own way instead of copying the structure of the original
Started from a different point, presenting information in a different order

Because of this, we’re able to clearly convey the relevant information from the source without sticking too close to the original phrasing.

Explore the tabs below to see examples of paraphrasing in action.

Journal article
Newspaper article
Magazine article

Once you have your perfectly paraphrased text, you need to ensure you credit the original author. You’ll always paraphrase sources in the same way, but you’ll have to use a different type of in-text citation depending on what citation style you follow.

Generate accurate citations with Scribbr

Scribbr citation checker new.

The AI-powered Citation Checker helps you avoid common mistakes such as:

Missing commas and periods
Incorrect usage of “et al.”
Ampersands (&) in narrative citations
Missing reference entries

It’s a good idea to paraphrase instead of quoting in most cases because:

Paraphrasing shows that you fully understand the meaning of a text
Your own voice remains dominant throughout your paper
Quotes reduce the readability of your text

But that doesn’t mean you should never quote. Quotes are appropriate when:

Giving a precise definition
Saying something about the author’s language or style (e.g., in a literary analysis paper)
Providing evidence in support of an argument
Critiquing or analyzing a specific claim

A paraphrase puts a specific passage into your own words. It’s typically a similar length to the original text, or slightly shorter.

When you boil a longer piece of writing down to the key points, so that the result is a lot shorter than the original, this is called summarizing .

Paraphrasing and quoting are important tools for presenting specific information from sources. But if the information you want to include is more general (e.g., the overarching argument of a whole article), summarizing is more appropriate.

When paraphrasing, you have to be careful to avoid accidental plagiarism .

This can happen if the paraphrase is too similar to the original quote, with phrases or whole sentences that are identical (and should therefore be in quotation marks). It can also happen if you fail to properly cite the source.

Paraphrasing tools are widely used by students, and can be especially useful for non-native speakers who may find academic writing particularly challenging. While these can be helpful for a bit of extra inspiration, use these tools sparingly, keeping academic integrity in mind.

To make sure you’ve properly paraphrased and cited all your sources, you could elect to run a plagiarism check before submitting your paper. And of course, always be sure to read your source material yourself and take the first stab at paraphrasing on your own.

If you want to know more about ChatGPT, AI tools , citation , and plagiarism , make sure to check out some of our other articles with explanations and examples.

ChatGPT vs human editor
ChatGPT citations
Is ChatGPT trustworthy?
Using ChatGPT for your studies
What is ChatGPT?
Chicago style
Critical thinking

Plagiarism

Types of plagiarism
Self-plagiarism
Avoiding plagiarism
Academic integrity
Consequences of plagiarism
Common knowledge

To paraphrase effectively, don’t just take the original sentence and swap out some of the words for synonyms. Instead, try:

Reformulating the sentence (e.g., change active to passive , or start from a different point)
Combining information from multiple sentences into one
Leaving out information from the original that isn’t relevant to your point
Using synonyms where they don’t distort the meaning

The main point is to ensure you don’t just copy the structure of the original text, but instead reformulate the idea in your own words.

Paraphrasing without crediting the original author is a form of plagiarism , because you’re presenting someone else’s ideas as if they were your own.

However, paraphrasing is not plagiarism if you correctly cite the source . This means including an in-text citation and a full reference, formatted according to your required citation style .

As well as citing, make sure that any paraphrased text is completely rewritten in your own words.

Plagiarism means using someone else’s words or ideas and passing them off as your own. Paraphrasing means putting someone else’s ideas in your own words.

So when does paraphrasing count as plagiarism?

Paraphrasing is plagiarism if you don’t properly credit the original author.
Paraphrasing is plagiarism if your text is too close to the original wording (even if you cite the source). If you directly copy a sentence or phrase, you should quote it instead.
Paraphrasing is not plagiarism if you put the author’s ideas completely in your own words and properly cite the source .

Try our services

To present information from other sources in academic writing , it’s best to paraphrase in most cases. This shows that you’ve understood the ideas you’re discussing and incorporates them into your text smoothly.

It’s appropriate to quote when:

Changing the phrasing would distort the meaning of the original text
You want to discuss the author’s language choices (e.g., in literary analysis )
You’re presenting a precise definition
You’re looking in depth at a specific claim

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Gahan, C. & Caulfield, J. (2023, June 01). How to Paraphrase | Step-by-Step Guide & Examples. Scribbr. Retrieved April 9, 2024, from https://www.scribbr.com/working-with-sources/how-to-paraphrase/

Is this article helpful?

Courtney Gahan

Other students also liked, how to write a summary | guide & examples, how to quote | citing quotes in apa, mla & chicago, how to avoid plagiarism | tips on citing sources, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

IMAGES

FREE 13+ Research Analysis Samples in MS Word
FREE 13+ Research Analysis Samples in MS Word
6+ Research Analysis Templates
Example Of Methodology In Research Paper
Standard statistical tools in research and data analysis
😍 Case analysis sample format. Case Analysis. 2022-10-17

VIDEO

SAMPLING PROCEDURE AND SAMPLE (QUALITATIVE RESEARCH)
Testing for Normality of Data Distribution (Independent Samples)
"Mastering the Art of Research Methodology: Empowering Faculty for Excellence"
How to present research tools, procedures and data analysis techniques
Common Errors in Quantitative Research
Examples of regression analyses [in 53 sec.] #shorts

COMMENTS

The Beginner's Guide to Statistical Analysis
It is an important research tool used by scientists, governments, businesses, and other organizations. To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.
Data Analysis in Research: Types & Methods
Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...
How To Write an Analysis (With Examples and Tips)
Writing an analysis requires a particular structure and key components to create a compelling argument. The following steps can help you format and write your analysis: Choose your argument. Define your thesis. Write the introduction. Write the body paragraphs. Add a conclusion. 1. Choose your argument.
Qualitative Data Analysis Methods: Top 6 + Examples
QDA Method #1: Qualitative Content Analysis. Content analysis is possibly the most common and straightforward QDA method. At the simplest level, content analysis is used to evaluate patterns within a piece of content (for example, words, phrases or images) or across multiple pieces of content or sources of communication. For example, a collection of newspaper articles or political speeches.
A practical guide to data analysis in general literature reviews
This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.
Quantitative Data Analysis Methods & Techniques 101
Quantitative data analysis is one of those things that often strikes fear in students. It's totally understandable - quantitative analysis is a complex topic, full of daunting lingo, like medians, modes, correlation and regression.Suddenly we're all wishing we'd paid a little more attention in math class…. The good news is that while quantitative data analysis is a mammoth topic ...
What Is Data Analysis? (With Examples)
Written by Coursera Staff • Updated on Apr 1, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...
Content Analysis Method and Examples
Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). ... first identify a research question and choose a sample or samples for analysis. The research question must be focused so the concept types are not open to interpretation and can be ...
How to conduct a meta-analysis in eight steps: a practical guide
2.1 Step 1: defining the research question. The first step in conducting a meta-analysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.
Content Analysis
Step 1: Select the content you will analyse. Based on your research question, choose the texts that you will analyse. You need to decide: The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)
Learning to Do Qualitative Data Analysis: A Starting Point
For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...
Units of Analysis and Methodologies for Qualitative Studies
Units of Analysis and Methodologies for Qualitative Studies. By Janet Salmons, PhD Manager, Sage Research Methods Community. Selecting the methodology is an essential piece of research design. This post is excerpted and adapted from Chapter 2 of Doing Qualitative Research Online (2022). Use the code COMMUNITY3 for a 20% discount on the book ...
Basic statistical tools in research and data analysis
An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis. ... Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of ...
45 Analysis Examples (2024)
Analysis Examples. 1. Classifying or Categorizing. Classifying or categorizing involves arranging data, information, or objects into groups based on their shared attributes or characteristics. This process aids in understanding and organizing vast amounts of information, making it easier to analyze and interpret.
A Step-by-Step Process of Thematic Analysis to Develop a Conceptual
Thematic analysis is a research method used to identify and interpret patterns or themes in a data set; it often leads to new insights ... repetition, rationale, repartee, and regal. Examples of the 6Rs based on the exemplar study are as follows. • Realness: The keywords selected by the researchers reflected the real experiences and ...
Unit of Analysis: Definition, Types & Examples
Unit of Analysis: Definition, Types & Examples. The unit of analysis is the people or things whose qualities will be measured. The unit of analysis is an essential part of a research project. It's the main thing that a researcher looks at in his research. A unit of analysis is the object about which you hope to have something to say at the ...
Meta-Analysis
Definition. "A meta-analysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. Meta-analysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning ...
Research Paper Analysis: How to Analyze a Research Article + Example
Save the word count for the "meat" of your paper — that is, for the analysis. 2. Summarize the Article. Now, you should write a brief and focused summary of the scientific article. It should be shorter than your analysis section and contain all the relevant details about the research paper.
What is data analysis? Examples and how to start
Data analysis is the process of examining, filtering, adapting, and modeling data to help solve problems. Data analysis helps determine what is and isn't working, so you can make the changes needed to achieve your business goals. Keep in mind that data analysis includes analyzing both quantitative data (e.g., profits and sales) and qualitative ...
MetaboAnalyst 6.0: towards a unified platform for metabolomics data
Causal analysis via two-sample Mendelian randomization. ... This research was funded by Genome Canada, Canadian Foundation for Innovation (CFI), US National Institutes of Health (U01 CA235493), Canadian Institutes of Health Research (CIHR), Juvenile Diabetes Research Foundation (JDRF), Natural Sciences and Engineering Research Council of Canada ...
What Researchers Discovered When They Sent 80,000 Fake Résumés to U.S
Analysis that explains politics, policy and everyday life. Employment Discrimination: Researchers sent 80,000 fake résumés to some of the largest companies in the United States.
Frontiers
Objective: Several observational studies have shown that high-volume and high-intensity exercise training increases the prevalence and severity of coronary atherosclerosis, but the causal effect still remains uncertain. This study aims to explore the causal relationship between the volume of strenuous exercise (SE) and coronary atherosclerosis (CA) using the Mendelian randomization (MR) method ...
Quantification of total human DNA using qPCR in mixture samples of
Compliance with ethical standards. The study was approved by the Research Ethics Committee of the Institute of Legal Medicine and Forensic Sciences (REPORT N° 002-2022-MP-FN-OFGACAL-CEI), by the Research Committee of the Institute of Legal Medicine and Forensic Sciences (TECHNICAL REPORT N° 010-2022) and by the Quality Assurance Office of the Public Ministry (OFFICE N° 000132-2022-MP-FN ...
Cyber Resilience Act Requirements Standards Mapping
To facilitate adoption of the CRA provisions, these requirements need to be translated into the form of harmonised standards, with which manufacturers can comply. In support of the standardisation effort, this study attempt to identify the most relevant existing cybersecurity standards for each CRA requirement, analyses the coverage already offered on the intended scope of the requirement and ...
No Reliable Evidence Supports the Presence of Javan Tigers
A paper recently published in Oryx by [Wirdateti et al. (2024)][1] suggests that the extinct Javan tiger may still survive on the Island of Java, Indonesia, based on mtDNA analysis of a single hair collected from a claimed tiger encounter site. After carefully re-analyzing the data presented in [Wirdateti et al. (2024)][1], we conclude that there is little support for the authors' statements.
How to Paraphrase
Source text Paraphrase "The current research extends the previous work by revealing that listening to moral dilemmas could elicit a FLE [foreign-language effect] in highly proficient bilinguals. … Here, it has been demonstrated that hearing a foreign language can even influence moral decision making, and namely promote more utilitarian-type decisions" (Brouwer, 2019, p. 874).