LEARN STATISTICS EASILY

LEARN STATISTICS EASILY

Learn Data Analysis Now!

LEARN STATISTICS EASILY LOGO 2

Exploring Standard Deviation: Statistics and Data Analysis Made Easy

You’ll learn the fundamentals of standard deviation and its importance in data analysis, interpretation techniques, and practical applications.

  • Assess data spread using standard deviation around the mean.
  • Lower standard deviations reveal consistent datasets.
  • Evaluate risk in finance with standard deviation.
  • Some interpretations assume normal data distribution.
  • Outliers impact the reliability of standard deviation.

What is Standard Deviation?

Standard deviation measures the variation or dispersion in a set of values. It is used to quantify the spread of data points in a dataset relative to the mean (average) value. For example, a low standard deviation indicates that the data points are close to the mean. In contrast, a high standard deviation shows that the data points are more widely spread.

By providing insights into the variability of a dataset, standard deviation helps researchers and analysts assess the reliability and consistency of data, identify patterns and trends, and make informed decisions based on the data’s distribution.

Standard Deviation Importance

Standard deviation is crucial in statistics and data analysis for understanding the variability of a dataset. It helps identify trends, assess data reliability, detect outliers, compare datasets, and evaluate risk. A high standard deviation indicates a larger spread of values. In contrast, a low standard deviation shows that the values are more tightly clustered around the mean.

Standard Deviation Applications​

The standard deviation has multiple applications across various industries and fields. For example, it is used in finance and investment to measure volatility, in manufacturing to monitor product quality, in social sciences to analyze data from surveys or experiments, in sports to assess athlete performance, in medicine to evaluate treatment outcomes, and in weather and climate analysis to identify patterns and trends.

Applied Statistics: Data Analysis

🕵️‍♂️ Discover the Data Analysis Techniques Top Experts Don’t Want You to Know

Click Here to Learn More! 🤫

How to Calculate Standard Deviation

Calculating standard deviation can be broken down into the following steps:

1. Compute the mean (average) of the dataset:

  • Add up all the values in the dataset.
  • Divide the sum by the number of values in the dataset.

2. Subtract the mean from each data point:

  • For each value in the dataset, subtract the mean calculated in step 1.

3. Square the differences:

  • Take the difference calculated in step 2 for each data point and square it.

4. Calculate the mean of the squared differences:

  • Add up all the squared differences from step 3.
  • Divide the sum by the number of squared differences.

Note: If you’re working with a sample rather than an entire population, divide by (number of squared differences – 1) instead of the number  of squared differences  to get an unbiased estimate of the population variance.

5. Take the square root of the mean of squared differences:

  • The square root of the result from step 4 is the standard deviation of the dataset.

Consider the dataset: [3, 6, 9, 12, 15]

Step 1: Calculate the mean.  Mean = (3 + 6 + 9 + 12 + 15) / 5 = 45 / 5 = 9

Step 2: Subtract the mean from each data point. Differences: [-6, -3, 0, 3, 6]

Step 3: Square the differences. Squared differences: [36, 9, 0, 9, 36]

Step 4: Calculate the mean of the squared differences. Mean of squared differences = (36 + 9 + 0 + 9 + 36) / 5 = 90 / 5 = 18

Step 5: Take the square root of the mean of squared differences. Standard deviation = √18 ≈ 4.24

So, the standard deviation for this dataset is approximately 4.24.

How to interpret Standard Deviation?

Interpreting standard deviation involves understanding what the value represents in the context of the data being analyzed. Here are some general guidelines for interpreting standard deviation:

Measure of dispersion

Standard deviation quantifies a dataset’s spread of data points. A higher standard deviation indicates a greater degree of dispersion or variability in the data, while a lower standard deviation suggests that the data points are more tightly clustered around the mean.

Context-dependent interpretation

The interpretation of standard deviation depends on the context and domain in which it is being used. A high standard deviation may be acceptable in specific fields. In contrast, a low standard deviation may be more desirable in other areas. For example, in finance, a high standard deviation may indicate higher risk, while in quality control, a low standard deviation indicates consistency in the production process.

Relative to the mean

The standard deviation value should be interpreted relative to the mean of the dataset. In some cases, it may be useful to compute the coefficient of variation (CV), the ratio of the standard deviation to the mean. The CV is a dimensionless measure that helps compare the degree of variation across datasets with different units or widely varying means.

Empirical rule

The empirical rule (also known as the 68-95-99.7 rule) can help interpret standard deviation for datasets that follow a normal distribution. According to this rule, approximately 68% of the data falls within one standard deviation from the mean, about 95% within two standard deviations, and around 99.7% within three standard deviations.

Identifying outliers

When interpreting standard deviation, it’s essential to consider the presence of outliers, which can significantly impact the value. Outliers are data points that deviate substantially from the mean and may require further investigation to determine their cause.

Standard Deviation Limitations

Standard deviation is a valuable measure of dispersion. Still, it has limitations, including sensitivity to outliers, the assumption of normal distribution (when applicable), incomparability across different units, and interpretation challenges. Other measures of dispersion, graphical methods, or additional descriptive statistics may be necessary for specific situations.

Standard Deviation Condiderations

When using standard deviation for certain statistical analyses, it is crucial to consider the following factors to ensure accurate and meaningful insights into the data’s variability:

Scale of measurement: The data is measured on an interval or ratio scale.

Validity of the mean: The mean is a valid measure of central tendency.

Normal distribution assumption (when applicable): The data follows a normal distribution. This assumption is relevant for specific statistical tests and methods that involve the standard deviation.

Independence of observations: The data points are independent of each other.

Homoscedasticity (when applicable): The variability in the data is constant across different levels of the independent variable(s). This assumption is relevant when using the standard deviation in linear regression and other parametric analyses.

Understanding these factors is essential when using standard deviation in statistical analysis to ensure accurate and meaningful insights into the data’s variability.

When to Use Standard Deviation

Consider using standard deviation when quantifying dispersion or comparing the variability between datasets with similar means. It’s also helpful for assessing data consistency, evaluating risk or volatility in finance, and analyzing normally distributed data. However, keep in mind its limitations and use other measures of dispersion for skewed or non-interval data. Use standard deviation alongside other statistical tools to comprehensively understand your data.

Key Information on Standard Deviation

Standard deviation is a fundamental measure in statistics and data analysis that quantifies the dispersion of data points in a dataset relative to the mean. It plays a vital role in various fields and industries, helping professionals understand the variability of data, assess its reliability, and make informed decisions. However, it’s essential to be aware of its limitations and the importance of context when interpreting standard deviation. By combining standard deviation with other statistical tools and methods, researchers and analysts can gain a comprehensive understanding of their data and derive valuable insights for decision-making processes.

Refine your data analysis skills and present meaningful insights with confidence using our latest digital book!

Access FREE samples now and master advanced techniques in data analysis, including optimal sample size determination and effective communication of results.

Don’t miss the chance to immerse yourself in  Applied Statistics: Data Analysis  and unlock your full potential in data-driven decision making.

Click the link to start exploring!

Applied Statistics: Data Analysis

Can Standard Deviations Be Negative?

Connect with us on our social networks.

DAILY POSTS ON INSTAGRAM!

What is Standard Deviation Statistics

Similar posts.

Absolute Mean Deviation: Demystifying the Key Statistics Concept

Absolute Mean Deviation: Demystifying the Key Statistics Concept

Explore the key statistics concept, Absolute Mean Deviation, its advantages, calculation process, and practical application.

Standard Deviation Rules Misconceptions

Standard Deviation Rules Misconceptions

Standard deviation rules are often misunderstood, leading to incorrect data analysis. Learn the truth about these rules and how to use them correctly with this guide.

What Is The Standard Deviation?

What Is The Standard Deviation?

Explore the concept of Standard Deviation, a critical statistical measure, to understand data variability and when to apply it in data analysis.

A Comprehensive Guide to How Standard Deviation is Calculated

A Comprehensive Guide to How Standard Deviation is Calculated

Unlock the power of data analysis by learning how standard deviation is calculated. Enhance your statistical skills with our guide.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

standard deviation use in analysis of research data

  • En español – ExME
  • Em português – EME

A beginner’s guide to standard deviation and standard error

Posted on 26th September 2018 by Eveliina Ilola

Stick person, confused, with 2 equations either side of head

What is standard deviation?

Standard deviation tells you how spread out the data is. It is a measure of how far each observed value is from the mean. In any distribution, about 95% of values will be within 2 standard deviations of the mean.

standard deviation use in analysis of research data

How to calculate standard deviation

Standard deviation is rarely calculated by hand. It can, however, be done using the formula below, where x represents a value in a data set, μ represents the mean of the data set and N represents the number of values in the data set.

standard deviation use in analysis of research data

The steps in calculating the standard deviation are as follows:

  • For each value, find its distance to the mean
  • For each value, find the square of this distance
  • Find the sum of these squared values
  • Divide the sum by the number of values in the data set
  • Find the square root of this

What is standard error?

When you are conducting research, you often only collect data of a small sample of the whole population. Because of this, you are likely to end up with slightly different sets of values with slightly different means each time.

If you take enough samples from a population, the means will be arranged into a distribution around the true population mean. The standard deviation of this distribution, i.e. the standard deviation of sample means, is called the standard error.

The standard error tells you how accurate the mean of any given sample from that population is likely to be compared to the true population mean. When the standard error increases, i.e. the means are more spread out, it becomes more likely that any given mean is an inaccurate representation of the true population mean.

How to calculate standard error

Standard error can be calculated using the formula below, where σ represents standard deviation and n represents sample size.

standard deviation use in analysis of research data

Standard error increases when standard deviation, i.e. the variance of the population, increases. Standard error decreases when sample size increases – as the sample size gets closer to the true size of the population, the sample means cluster more and more around the true population mean.

Image 1: Dan Kernler via Wikipedia Commons: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG 

Image 2: https://www.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step

Image 3: https://toptipbio.com/standard-error-formula/

http://www.statisticshowto.com/probability-and-statistics/standard-deviation/

http://www.statisticshowto.com/what-is-the-standard-error-of-a-sample/

https://www.statsdirect.co.uk/help/basic_descriptive_statistics/standard_deviation.htm

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/2-mean-and-standard-deviation

' src=

Eveliina Ilola

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on A beginner’s guide to standard deviation and standard error

' src=

If it is allowable , I need this topic in the form of pdf

' src=

Thanks for the question Freddie. I have put it onto our Twitter account to see if any of the community can help with this. This article is interesting, but doesn’t answer your question of what to do when the error bar is not labelled: https://www.statisticshowto.com/error-bar-definition/ . I wonder how common this is? I’ll post any answers I get via twitter on here.

' src=

Hi I sometimes see bar charts with error bars, but it is not always stated if such bars are standard deviation or standard error bars. Is there some way to tell if the bars are SD or SE bars if they are not labelled ?

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Using Measures of Variability to Inspect Homogeneity of a Sample: Part 1

Measures of variability are statistical tools that help us assess data variability by informing us about the quality of a dataset mean. This first of two blogs on the topic will cover basic concepts of range, standard deviation, and variance.

Standard Deviation

Introduction.

The standard deviation is a measure of the spread of scores within a set of data. Usually, we are interested in the standard deviation of a population. However, as we are often presented with data from a sample only, we can estimate the population standard deviation from a sample standard deviation. These two standard deviations - sample and population standard deviations - are calculated differently. In statistics, we are usually presented with having to calculate sample standard deviations, and so this is what this article will focus on, although the formula for a population standard deviation will also be shown.

When to use the sample or population standard deviation

We are normally interested in knowing the population standard deviation because our population contains all the values we are interested in. Therefore, you would normally calculate the population standard deviation if: (1) you have the entire population or (2) you have a sample of a larger population, but you are only interested in this sample and do not wish to generalize your findings to the population. However, in statistics, we are usually presented with a sample from which we wish to estimate (generalize to) a population, and the standard deviation is no exception to this. Therefore, if all you have is a sample, but you wish to make a statement about the population standard deviation from which the sample is drawn, you need to use the sample standard deviation. Confusion can often arise as to which standard deviation to use due to the name "sample" standard deviation incorrectly being interpreted as meaning the standard deviation of the sample itself and not the estimate of the population standard deviation based on the sample.

What type of data should you use when you calculate a standard deviation?

The standard deviation is used in conjunction with the mean to summarise continuous data, not categorical data. In addition, the standard deviation, like the mean , is normally only appropriate when the continuous data is not significantly skewed or has outliers.

Examples of when to use the sample or population standard deviation

Q. A teacher sets an exam for their pupils. The teacher wants to summarize the results the pupils attained as a mean and standard deviation. Which standard deviation should be used?

A. Population standard deviation. Why? Because the teacher is only interested in this class of pupils' scores and nobody else.

Q. A researcher has recruited males aged 45 to 65 years old for an exercise training study to investigate risk markers for heart disease (e.g., cholesterol). Which standard deviation would most likely be used?

A. Sample standard deviation. Although not explicitly stated, a researcher investigating health related issues will not simply be concerned with just the participants of their study; they will want to show how their sample results can be generalised to the whole population (in this case, males aged 45 to 65 years old). Hence, the use of the sample standard deviation.

Q. One of the questions on a national consensus survey asks for respondents' age. Which standard deviation would be used to describe the variation in all ages received from the consensus?

A. Population standard deviation. A national consensus is used to find out information about the nation's citizens. By definition, it includes the whole population. Therefore, a population standard deviation would be used.

What are the formulas for the standard deviation?

The sample standard deviation formula is:

Sample standard deviation formula

The population standard deviation formula is:

Population standard deviation formula

Is there an easy way to calculate the standard deviation?

Yes, we have a sample and population standard deviation calculator that shows you all the working as well! Currently, our calculator is under maintenance, but if you would like us to let you know when it becomes available again, please contact us

Find the right market research agencies, suppliers, platforms, and facilities by exploring the services and solutions that best match your needs

list of top MR Specialties

Browse all specialties

Browse Companies and Platforms

by Specialty

by Location

Browse Focus Group Facilities

standard deviation use in analysis of research data

Manage your listing

Follow a step-by-step guide with online chat support to create or manage your listing.

About Greenbook Directory

IIEX Conferences

Discover the future of insights at the Insight Innovation Exchange (IIEX) event closest to you

IIEX Virtual Events

Explore important trends, best practices, and innovative use cases without leaving your desk

Insights Tech Showcase

See the latest research tech in action during curated interactive demos from top vendors

Stay updated on what’s new in insights and learn about solutions to the challenges you face

Greenbook Future list

An esteemed awards program that supports and encourages the voices of emerging leaders in the insight community.

Insight Innovation Competition

Submit your innovation that could impact the insights and market research industry for the better.

Find your next position in the world's largest database of market research and data analytics jobs.

standard deviation use in analysis of research data

For Suppliers

Directory: Renew your listing

Directory: Create a listing

Event sponsorship

Get Recommended Program

Digital Ads

Content marketing

Ads in Reports

Podcasts sponsorship

Run your Webinar

Host a Tech Showcase

Future List Partnership

All services

standard deviation use in analysis of research data

Dana Stanley

Greenbook’s Chief Revenue Officer

Research Methodologies

February 28, 2023

How to Interpret Standard Deviation and Standard Error in Research

Standard Deviation 101 When it comes to aggregating market research, many of us are fairly familiar with mean, median, and mode. However, one lever deeper on the mean specifically brings…

How to Interpret Standard Deviation and Standard Error in Research

by Karen Lynch

Head of Content at Greenbook

Standard Deviation 101

When it comes to aggregating market research, many of us are fairly familiar with mean, median, and mode. However, one lever deeper on the mean specifically brings us to standard deviation and standard error. Standard deviation specifically offers a variety of insights when it comes to analysis; in business, a standard deviation might imply how risky a venture is. In manufacturing, the standard deviation might reference quality control. So, while standard deviation and standard error are not the most common variables, they’re instrumental in analyzing the confidence surrounding data and results.

What is standard deviation?

Standard deviation is a valuable research tool as it tells how spread out data is. Standard deviation is a value of how far each data point is from the mean, and it is also a descriptive statistic. Descriptive statistics, not surprisingly, describe the features of a data set. This includes values like distribution, mean, median, mode, and variability. Standard deviation helps summarize data, and a high standard deviation signals lots of variability in data. Standard deviations create the famous bell curves of data.

“Focusing on the central tendency in data and not considering its diversity can be disastrous. Unless the average is close to 0% or 100%, we can’t assume that the average represents everyone. In fact, it could represent no one. Does a mediocre rating mean that most people think your offering is mediocre, or do some think it’s great while others think it’s terrible? Can you build a business around just the ones who think it is great? Understanding the standard deviation and standard error helps you to identify opportunities you might otherwise overlook.” – Nelson Whipple, GreenBook’s GRIT Research Director

Real-life applications of standard deviation

Standard deviation is not just a mathematical term used for research; it’s often used in everyday, real-life situations. From academic studies to business and finance to weather forecasting and medicine, standard deviation is a useful concept beyond the context of research.

Population traits

For example, if looking at population traits like height, weight, or IQ, standard deviation creates a bell curve of the data. If the mean IQ is 100, and the standard deviation equation gives us a value of 10, then we know that roughly ⅔ of the population has an IQ between 90 – 110. The remaining majority of the population would lie in more than one standard deviation of the mean, giving them an IQ of anywhere from 80 – 120.

Financial analysis

Another real-life application is in finance. When it comes to measuring the returns of different financial assets like stocks, bonds, commodities, and real estate, the standard deviation can illustrate how volatile or risky an investment might be.

Screw Loose! Market Research as a Commodity?

For example, Stock A and Stock B might have the same annual rate of return of 7%; however, when looking at the standard deviation, Stock A is 2%, and Stock B is 7%. As Stock B has more data points that fall farther from the mean, an investor might receive wildly different returns year to year, making it a more volatile investment. On the other hand, stock A would most likely have an average rate of return that is close to 7% every year!

How to calculate standard deviation

It’s not simple to calculate standard deviation by hand as it uses an advanced equation: (image here ). However,  free online calculators like this one make it simple to plug in the values and quickly see a standard deviation number.

What is standard error?

Standard error is a value of multiple populations and sample sizes. When taking multiple samples, eventually, all data will be aggregated around a true population mean. The standard deviation of this distribution becomes your standard error. Standard error lets researchers know how accurate a sampling of the population is. For example, if you took the standard deviation of five different samples, you’d be able to see various samples that fell outside the norm. Maybe a sample was  bias /">biased in some way or failed to hit the normal level of accuracy.

Standard deviation vs. standard error

What’s the difference between standard deviation and standard error? While closely related in survey and market research, standard deviation refers to variability within a single sample, while standard error clues researchers across multiple samples. Standard deviation gives you a closer look at an individual sample, while standard error is more useful for multiple sets of data.

“Would you rather know the average increase in property value in your neighborhood or the likelihood that your property’s value will increase by a certain amount? The mean tells you the former, and the standard deviation and standard error help you estimate the latter.” –  Nelson Whipple, GreenBook’s GRIT Research Director

How to calculate standard error

Similar to the standard deviation, the standard error is tough to calculate by hand, but it involves dividing the standard deviation by the sample size’s square root. Here is the formula, and here is a free online calculator .

When to use standard deviation and standard error

To determine confidence, volatility, and variability of data, standard deviation and standard error are both helpful tools in survey research and market research. To utilize them in your research, check out a free online calculator to quickly do the work for you.

Karen Lynch

56 articles

The views, opinions, data, and methodologies expressed above are those of the contributor(s) and do not necessarily reflect or represent the official policies, positions, or beliefs of Greenbook.

Comments are moderated to ensure respect towards the author and to prevent spam or self-promotion. Your comment may be edited, rejected, or approved based on these criteria. By commenting, you accept these terms and take responsibility for your contributions.

More from Karen Lynch

Greenbook Future List Spotlight: Belinda Brown

Future List Honorees

Greenbook Future List Spotlight: Belinda Brown

Belinda Brown, Director of Marketing at Gazelle Global, is committed to enhancing epresentation/research in women's health to address the gaps in heal...

April 11, 2024

Read article

Navigating the Dynamic Landscape of Insights and Analytics

Navigating the Dynamic Landscape of Insights and Analytics

Explore the growing impact of AI on companies and product launches. Discover new AI applications lik...

April 9, 2024

Watch video

Insights and Analytics in the AI Era

Insights and Analytics in the AI Era

Explore AI advancements, data quality, and ethical considerations. Discover the potential of AI in r...

April 2, 2024

Greenbook Future List Spotlight: Catrina Salama

Greenbook Future List Spotlight: Catrina Salama

Discover how Catrina Salama, Director at BioVid, uses emotion science to enhance healthcare messaging and how her expertise leads to impactful client ...

March 28, 2024

Top in Quantitative Research

Moving Away from a Narcissistic Market Research Model

Moving Away from a Narcissistic Market Research Model

Why are we still measuring brand loyalty? It isn’t something that naturally comes up with consumers, who rarely think about brand first, if at all. Ma...

Devora Rogers

Devora Rogers

Chief Strategy Officer at Alter Agents

May 31, 2023

The Stepping Stones of Innovation: Navigating Failure and Empathy with Carol Fitzgerald

Qualitative Research

The Stepping Stones of Innovation: Navigating Failure and Empathy with Carol Fitzgerald

Natalie Pusch

Natalie Pusch

Senior Content Producer at Greenbook

March 16, 2023

Play Episode

Sign Up for Updates

Get what matters, straight to your inbox. Curated by top Insight Market experts.

standard deviation use in analysis of research data

67k+ subscribers

Weekly Newsletter

Greenbook Podcast

Event Updates

I agree to receive emails with insights-related content from Greenbook. I understand that I can manage my email preferences or unsubscribe at any time and that Greenbook protects my privacy under the General Data Protection Regulation.*

Sign up for Updates

We will send you a greatest letters one per week for your happy.

Your guide for all things market research and consumer insights.

Create a new Listing

Manage my Listing

Find companies

Find focus group facilities

Tech Showcases

GRIT report

Expert Channels

Get in touch

For suppliers

Marketing Services

Future List

Publish with us

Privacy policy

Cookie policy

Terms of use

Copyright © 2024 New York AMA Communication Services, Inc. All rights reserved. 234 5th Avenue, 2nd Floor, New York, NY 10001 | Phone: (212) 849-2752

Banner

  • Why Study Statistics?
  • Descriptive & Inferential Statistics
  • Fundamental Elements of Statistics
  • Quantitative and Qualitative Data
  • Measurement Data Levels
  • Collecting Data
  • Ethics in Statistics
  • Describing Qualitative Data
  • Describing Quantitative Data
  • Stem-and-Leaf Plots
  • Measures of Central Tendency
  • Measures of Variability
  • Describing Data using the Mean and Standard Deviation
  • Measures of Position
  • Counting Techniques
  • Simple & Compound Events
  • Independent and Dependent Events
  • Mutually Exclusive and Non-Mutually Exclusive Events
  • Permutations and Combinations
  • Normal Distribution
  • Central Limit Theorem
  • Confidence Intervals
  • Determining the Sample Size
  • Hypothesis Testing
  • Hypothesis Testing Process

How does the mean and standard deviation describe data?

The standard deviation is a measurement in reference to the mean that means:

  • A large standard deviation indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean.
  • When deciding whether sample measurements are suitable inferences for the population, the standard deviation of those measurements is of crucial importance.
  • Standard deviations are often used as a measure of risk in finance associated with price-fluctuations of stocks, bonds, etc.

Chebyshev's rule  is an approximation of the percentage of data points captured between deviations of any data set. 

undefined

Example: A sample of size \(n=50\) has mean \(\bar{x}=28\) and standard deviation \(s=3\). Without knowing anything else about the sample, what can be said about the number of observations that lie in the interval \(922,34)\)? What can be said about the number of observations that lie outside the interval?

The interval \((22,34)\) is formed by adding and subtracting two standard deviations from the mean. By Chebyshev's Theorem, at least \(\frac{3}{4}\) of the data are within this interval. Since \(\frac{3}{4}\) of \(50\) is \(37.5\), this means that at least 37.5 observations are in the interval. But \(.5\) of a measurement does not make sense, so we conclude that at least 38 observations must lie inside the interval \((22,34)\).

If \(\frac{3}{4}\) of the observations are made inside the interval, than \(\frac{1}{4}\) of them are outside. We conclude that at most 12 \((50-38=12)\) observations lie outside the interval \((22,34)\).

There are more  accurate  ways of calculating the percentage or number of intervals inside standard deviations. Chebyshev's Theorem and the empirical rule we'll introduce next are just approximations.

If the histogram of a data set is approximately bell-shaped, we can approximate the percentage of data between standard deviations using the  empirical rule . 

undefined

Example: Heights of 18-yr-old males have a bell-shaped distribution with mean \(69.6\) inches and standard deviation \(1.4\) inches. About what proportion of all such mean are between 68.2 and 71 inches tall? And What interval centered on the mean should contain about 95% of all such mean?

Since the interval \((68.2,71.0)\) are one standard deviation from the mean, by the emprical rule, 68% of all 18-year old males have heights in this range.

95% by the empirical rule represents plus/minus two standard deviations from the mean.

\[\bar{x} \pm 2s = 69.6 \pm 2(1.4) = 66.8,\,72.4\]

Therefore, 95% of the mean are between 66.8 inches to 72.4 inches.

  • Practice Questions - Empirical Rule

standard deviation use in analysis of research data

  • << Previous: Measures of Variability
  • Next: Measures of Position >>
  • Last Updated: Apr 20, 2023 12:47 PM
  • URL: https://libraryguides.centennialcollege.ca/c.php?g=717168

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

standard deviation use in analysis of research data

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

2.4: Applications of Standard Deviation

  • Last updated
  • Save as PDF
  • Page ID 25649

Learning Objectives

  • To learn what the value of the standard deviation of a data set implies about how the data scatter away from the mean as described by the Empirical Rule and Chebyshev’s Theorem .
  • To use the Empirical Rule and Chebyshev’s Theorem to draw conclusions about a data set.

You probably have a good intuitive grasp of what the average of a data set says about that data set. In this section we begin to learn what the standard deviation has to tell us about the nature of the data set.

The Empirical Rule

We start by examining a specific set of data. Table \(\PageIndex{1}\) shows the heights in inches of \(100\) randomly selected adult men. A relative frequency histogram for the data is shown in Figure \(\PageIndex{1}\). The mean and standard deviation of the data are, rounded to two decimal places, \(\bar{x}=69.92\) and \(\sigma = 1.70\).

If we go through the data and count the number of observations that are within one standard deviation of the mean, that is, that are between \(69.92-1.70=68.22\) and \(69.92+1.70=71.62\) inches, there are \(69\) of them. If we count the number of observations that are within two standard deviations of the mean, that is, that are between \(69.92-2(1.70)=66.52\) and \(69.92+2(1.70)=73.32\) inches, there are \(95\) of them. All of the measurements are within three standard deviations of the mean, that is, between \(69.92-3(1.70)=64.822\) and \(69.92+3(1.70)=75.02\) inches. These tallies are not coincidences, but are in agreement with the following result that has been found to be widely applicable.

Heights of Adult Men

Approximately \(68\%\) of the data lie within one standard deviation of the mean, that is, in the interval with endpoints \(\bar{x}\pm s\) for samples and with endpoints \(\mu \pm \sigma\) for populations; if a data set has an approximately bell-shaped relative frequency histogram, then (Figure \(\PageIndex{2}\))

  • approximately \(95\%\) of the data lie within two standard deviations of the mean, that is, in the interval with endpoints \(\bar{x}\pm 2s\) for samples and with endpoints \(\mu \pm 2\sigma\) for populations; and
  • approximately \(99.7\%\) of the data lies within three standard deviations of the mean, that is, in the interval with endpoints \(\bar{x}\pm 3s\) for samples and with endpoints \(\mu \pm 3\sigma\) for populations.

alt

Two key points in regard to the Empirical Rule are that the data distribution must be approximately bell-shaped and that the percentages are only approximately true. The Empirical Rule does not apply to data sets with severely asymmetric distributions, and the actual percentage of observations in any of the intervals specified by the rule could be either greater or less than those given in the rule. We see this with the example of the heights of the men: the Empirical Rule suggested 68 observations between \(68.22\) and \(71.62\) inches, but we counted \(69\).

Example \(\PageIndex{1}\)

Heights of \(18\)-year-old males have a bell-shaped distribution with mean \(69.6\) inches and standard deviation \(1.4\) inches.

  • About what proportion of all such men are between \(68.2\) and \(71\) inches tall?
  • What interval centered on the mean should contain about \(95\%\) of all such men?

A sketch of the distribution of heights is given in Figure \(\PageIndex{3}\).

  • Since the interval from \(68.2\) to \(71.0\) has endpoints \(\bar{x}-s\) and \(\bar{x}+s\), by the Empirical Rule about \(68\%\) of all \(18\)-year-old males should have heights in this range.
  • By the Empirical Rule the shortest such interval has endpoints \(\bar{x}-2s\) and \(\bar{x}+2s\). Since \[\bar{x}-2s=69.6-2(1.4)=66.8 \nonumber \] and \[ \bar{x}+2s=69.6+2(1.4)=72.4 \nonumber \]

the interval in question is the interval from \(66.8\) inches to \(72.4\) inches.

Distribution of Heights

Example \(\PageIndex{2}\)

Scores on IQ tests have a bell-shaped distribution with mean \(\mu =100\) and standard deviation \(\sigma =10\). Discuss what the Empirical Rule implies concerning individuals with IQ scores of \(110\), \(120\), and \(130\).

A sketch of the IQ distribution is given in Figure \(\PageIndex{3}\). The Empirical Rule states that

  • approximately \(68\%\) of the IQ scores in the population lie between \(90\) and \(110\),
  • approximately \(95\%\) of the IQ scores in the population lie between \(80\) and \(120\), and
  • approximately \(99.7\%\) of the IQ scores in the population lie between \(70\) and \(130\).

Distribution of IQ Scores

  • Since \(68\%\) of the IQ scores lie within the interval from \(90\) to \(110\), it must be the case that \(32\%\) lie outside that interval. By symmetry approximately half of that \(32\%\), or \(16\%\) of all IQ scores, will lie above \(110\). If \(16\%\) lie above \(110\), then \(84\%\) lie below. We conclude that the IQ score \(110\) is the \(84^{th}\) percentile.
  • The same analysis applies to the score \(120\). Since approximately \(95\%\) of all IQ scores lie within the interval form \(80\) to \(120\), only \(5\%\) lie outside it, and half of them, or \(2.5\%\) of all scores, are above \(120\). The IQ score \(120\) is thus higher than \(97.5\%\) of all IQ scores, and is quite a high score.
  • By a similar argument, only \(15/100\) of \(1\%\) of all adults, or about one or two in every thousand, would have an IQ score above \(130\). This fact makes the score \(130\) extremely high.

Chebyshev’s Theorem

The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of approximations. A result that applies to every data set is known as Chebyshev’s Theorem.

For any numerical data set,

  • at least \(3/4\) of the data lie within two standard deviations of the mean, that is, in the interval with endpoints \(\bar{x}\pm 2s\) for samples and with endpoints \(\mu \pm 2\sigma\) for populations;
  • at least \(8/9\) of the data lie within three standard deviations of the mean, that is, in the interval with endpoints \(\bar{x}\pm 3s\) for samples and with endpoints \(\mu \pm 3\sigma\) for populations;
  • at least \(1-1/k^2\) of the data lie within \(k\) standard deviations of the mean, that is, in the interval with endpoints \(\bar{x}\pm ks\) for samples and with endpoints \(\mu \pm k\sigma\) for populations, where \(k\) is any positive whole number that is greater than \(1\).

Figure \(\PageIndex{4}\) gives a visual illustration of Chebyshev’s Theorem.

Chebyshev’s Theorem

It is important to pay careful attention to the words “at least” at the beginning of each of the three parts of Chebyshev’s Theorem. The theorem gives the minimum proportion of the data which must lie within a given number of standard deviations of the mean; the true proportions found within the indicated regions could be greater than what the theorem guarantees.

Example \(\PageIndex{3}\)

A sample of size \(n=50\) has mean \(\bar{x}=28\) and standard deviation \(s=3\). Without knowing anything else about the sample, what can be said about the number of observations that lie in the interval \((22,34)\)? What can be said about the number of observations that lie outside that interval?

The interval \((22,34)\) is the one that is formed by adding and subtracting two standard deviations from the mean. By Chebyshev’s Theorem, at least \(3/4\) of the data are within this interval. Since \(3/4\) of \(50\) is \(37.5\), this means that at least \(37.5\) observations are in the interval. But one cannot take a fractional observation, so we conclude that at least \(38\) observations must lie inside the interval \((22,34)\).

If at least \(3/4\) of the observations are in the interval, then at most \(1/4\) of them are outside it. Since \(1/4\) of \(50\) is \(12.5\), at most \(12.5\) observations are outside the interval. Since again a fraction of an observation is impossible, \(x\; (22,34)\).

alt

Example \(\PageIndex{4}\)

The number of vehicles passing through a busy intersection between \(8:00\; a.m.\) and \(10:00\; a.m.\) was observed and recorded on every weekday morning of the last year. The data set contains \(n=251\) numbers. The sample mean is \(\bar{x}=725\) and the sample standard deviation is \(s=25\). Identify which of the following statements must be true.

  • On approximately \(95\%\) of the weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was between \(675\) and \(775\).
  • On at least \(75\%\) of the weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was between \(675\) and \(775\).
  • On at least \(189\) weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was between \(675\) and \(775\).
  • On at most \(25\%\) of the weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was either less than \(675\) or greater than \(775\).
  • On at most \(12.5\%\) of the weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was less than \(675\).
  • On at most \(25\%\) of the weekday mornings last year the number of vehicles passing through the intersection from \(8:00\; a.m.\) to \(10:00\; a.m.\) was less than \(675\).
  • Since it is not stated that the relative frequency histogram of the data is bell-shaped, the Empirical Rule does not apply. Statement (1) is based on the Empirical Rule and therefore it might not be correct.
  • Statement (2) is a direct application of part (1) of Chebyshev’s Theorem because \(\bar{x}-2s\), \(\bar{x}+2s = (675,775)\). It must be correct.
  • Statement (3) says the same thing as statement (2) because \(75\%\) of \(251\) is \(188.25\), so the minimum whole number of observations in this interval is \(189\). Thus statement (3) is definitely correct.
  • Statement (4) says the same thing as statement (2) but in different words, and therefore is definitely correct.
  • Statement (4), which is definitely correct, states that at most \(25\%\) of the time either fewer than \(675\) or more than \(775\) vehicles passed through the intersection. Statement (5) says that half of that \(25\%\) corresponds to days of light traffic. This would be correct if the relative frequency histogram of the data were known to be symmetric. But this is not stated; perhaps all of the observations outside the interval (\(675,775\)) are less than \(75\). Thus statement (5) might not be correct.
  • Statement (4) is definitely correct and statement (4) implies statement (6): even if every measurement that is outside the interval (\(675,775\)) is less than \(675\) (which is conceivable, since symmetry is not known to hold), even so at most \(25\%\) of all observations are less than \(675\). Thus statement (6) must definitely be correct.

Key Takeaway

  • The Empirical Rule is an approximation that applies only to data sets with a bell-shaped relative frequency histogram. It estimates the proportion of the measurements that lie within one, two, and three standard deviations of the mean.
  • Chebyshev’s Theorem is a fact that applies to all possible data sets. It describes the minimum proportion of the measurements that lie must within one, two, or more standard deviations of the mean.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 3.

  • Measures of spread: range, variance & standard deviation
  • Variance of a population
  • Population standard deviation
  • The idea of spread and standard deviation

Calculating standard deviation step by step

  • Standard deviation of a population
  • Mean and standard deviation versus median and IQR
  • Concept check: Standard deviation
  • Statistics: Alternate variance formulas

Introduction

Overview of how to calculate standard deviation.

SD = ∑ | x − μ | 2 N ‍  

An important note

Step-by-step interactive example for calculating standard deviation.

6 , 2 , 3 , 1 ‍  

Step 1: Finding μ ‍   in ∑ | x − μ | 2 N ‍  

  • Your answer should be
  • an integer, like 6 ‍  
  • a simplified proper fraction, like 3 / 5 ‍  
  • a simplified improper fraction, like 7 / 4 ‍  
  • a mixed number, like 1   3 / 4 ‍  
  • an exact decimal, like 0.75 ‍  
  • a multiple of pi, like 12   pi ‍   or 2 / 3   pi ‍  

Step 2: Finding | x − μ | 2 ‍   in ∑ | x − μ | 2 N ‍  

Step 3: finding ∑ | x − μ | 2 ‍   in ∑ | x − μ | 2 n ‍  , step 4: finding ∑ | x − μ | 2 n ‍   in ∑ | x − μ | 2 n ‍  , step 5: finding the standard deviation ∑ | x − μ | 2 n ‍  , summary of what we did.

μ = 6 + 2 + 3 + 1 4 = 12 4 = 3 ‍  
SD = ∑ | x − μ | 2 N = 9 + 1 + 0 + 4 4 = 14 4                 Sum the squares of the distances (Step 3). = 3.5                 Divide by the number of data points (Step 4). ≈ 1.87                 Take the square root (Step 5). ‍  

Try it yourself

1 , 4 , 7 , 2 , 6 ‍  

Find the mean

Find the square of the distances from each of the data points to the mean, apply the formula, want to join the conversation.

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

standard deviation use in analysis of research data

Home Market Research

Data Analysis in Research: Types & Methods

data-analysis-in-research

Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.

MORE LIKE THIS

event feedback software

Event Feedback Software: Top 11 Best in 2024

Apr 9, 2024

free market research tools

Top 10 Free Market Research Tools to Boost Your Business

Behavior analytics tools

Best 15 Behavior Analytics Tools to Explore Your User Actions

Apr 8, 2024

concept testing tools

Top 7 Concept Testing Tools to Elevate Your Ideas in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Statology

Statistics Made Easy

6 Examples of Using Standard Deviation in Real Life

The standard deviation is used to measure the spread of values in a dataset.

Individuals and companies use standard deviation all the time in different fields to gain a better understanding of datasets.

The following examples explain how the standard deviation is used in different real life scenarios.

Example 1: Standard Deviation in Weather Forecasting

Standard deviation is widely used in weather forecasting to understand how much variation exists in daily and monthly temperatures in different cities.

For example:

  • A weatherman who works in a city with a small standard deviation in temperatures year-round can confidently predict what the weather will be on a given day since temperatures don’t vary much from one day to the next.
  • A weatherman who works in a city with a high standard deviation in temperatures will be less confident in his predictions because there is much more variation in temperatures from one day to the next.

Example 2: Standard Deviation in Healthcare

Standard deviation is widely used by insurance analysts and actuaries in the healthcare industry.

  • Insurance analysts often calculate the standard deviation of the age of the individuals they provide insurance for so they can understand how much variation exists among the age of individuals they provide insurance for.
  • Actuaries calculate standard deviation of healthcare usage so they can know how much variation in usage to expect in a given month, quarter, or year.

Example 3: Standard Deviation in Real Estate

Standard deviation is a metric that is used often by real estate agents.

  • Real estate agents calculate the standard deviation of house prices in a particular area so they can inform their clients of the type of variation in house prices they can expect.
  • Real estate agents also calculate the standard deviation of the square footage of house prices in certain areas so they can inform their clients on what type of variation to expect in terms of square footage of houses in a particular area.

Example 4: Standard Deviation in Human Resources

Standard deviation is often used by individuals who work in Human Resource departments at companies.

  • Human Resource managers often calculate the standard deviation of salaries in a certain field so that they can know what type of variation in salaries to offer to new employees.

Example 5: Standard Deviation in Marketing

Standard deviation is often used by marketers to gain an understanding of how their advertisements perform.

  • Marketers often calculate the standard deviation of revenue earned per advertisement so they can understand how much variation to expect in revenue from a given ad.
  • Marketers also calculate the standard deviation of the number of ads used by competitors to understand whether or not competitors are using more or less ads than normal during a given period.

Example 6: Standard Deviation in Test Scores

Standard deviation is used by professors at universities to calculate the spread of test scores among students.

  • Professors can calculate the standard deviation of test scores on a final exam to better understand whether most students score close to the average or if there is a wide spread in test scores.
  • Professors can also calculate the standard deviation of test scores for multiple classes to understand which classes had the highest variation in test scores among students.

Additional Resources

The following tutorials offer more details on how standard deviation is used in real life.

Why is Standard Deviation Important? (Explanation + Examples) What is Considered a Good Standard Deviation? Range vs. Standard Deviation: When to Use Each

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is Standard Deviation?

What does standard deviation measure, standard deviation formula, standard deviation vs. variance.

  • Limitations
  • Standard Deviation FAQs

The Bottom Line

  • Corporate Finance
  • Financial Ratios

Standard Deviation Formula and Uses vs. Variance

standard deviation use in analysis of research data

Amanda Bellucco-Chatham is an editor, writer, and fact-checker with years of experience researching personal finance topics. Specialties include general financial planning, career development, lending, retirement, tax preparation, and credit.

Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance . The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.

If the data points are further from the mean, there is a higher deviation within the data set; thus, the more spread out the data, the higher the standard deviation.

Key Takeaways:

  • Standard deviation measures the dispersion of a dataset relative to its mean.
  • It is calculated as the square root of the variance.
  • Standard deviation, in finance, is often used as a measure of a relative riskiness of an asset.
  • A volatile stock has a high standard deviation, while the deviation of a stable blue-chip stock is usually rather low.
  • As a downside, the standard deviation calculates all uncertainty as risk, even when it’s in the investor's favor—such as above-average returns.

Investopedia / Alex Dos Diaz

Standard deviation is a statistical measurement in finance that, when applied to the annual rate of return of an investment, sheds light on that investment's historical volatility .

The greater the standard deviation of securities, the greater the variance between each price and the mean, which shows a larger price range. For example, a volatile stock has a high standard deviation, while the deviation of a stable blue-chip stock is usually rather low.

Standard deviation is calculated by taking the square root of a value derived from comparing data points to a collective mean of a population. The formula is:

Standard Deviation = ∑ i = 1 n ( x i − x ‾ ) 2 n − 1 where: x i = Value of the  i t h  point in the data set x ‾ = The mean value of the data set n = The number of data points in the data set \begin{aligned} &\text{Standard Deviation} = \sqrt{ \frac{\sum_{i=1}^{n}\left(x_i - \overline{x}\right)^2} {n-1} }\\ &\textbf{where:}\\ &x_i = \text{Value of the } i^{th} \text{ point in the data set}\\ &\overline{x}= \text{The mean value of the data set}\\ &n = \text{The number of data points in the data set} \end{aligned} ​ Standard Deviation = n − 1 ∑ i = 1 n ​ ( x i ​ − x ) 2 ​ ​ where: x i ​ = Value of the  i t h  point in the data set x = The mean value of the data set n = The number of data points in the data set ​

Calculating standard deviation

Standard deviation is calculated as follows:

  • Calculate the mean of all data points. The mean is calculated by adding all the data points and dividing them by the number of data points.
  • Calculate the variance for each data point. The variance for each data point is calculated by subtracting the mean from the value of the data point.
  • Square the variance of each data point (from Step 2).
  • Sum of squared variance values (from Step 3).
  • Divide the sum of squared variance values (from Step 4) by the number of data points in the data set less 1.
  • Take the square root of the quotient (from Step 5).

Why Is Standard Deviation a Key Risk Measure?

Standard deviation is an especially useful tool in investing and trading strategies as it helps measure market and security volatility —and predict performance trends. As it relates to investing, for example, an index fund is likely to have a low standard deviation versus its benchmark index, as the fund's goal is to replicate the index.

On the other hand, one can expect aggressive growth funds to have a high standard deviation from relative stock indices, as their portfolio managers make aggressive bets to generate higher-than-average returns .

A lower standard deviation isn't necessarily preferable. It all depends on the investments and the investor's willingness to assume risk. When dealing with the amount of deviation in their portfolios, investors should consider their tolerance for volatility and their overall investment objectives. More aggressive investors may be comfortable with an investment strategy that opts for vehicles with higher-than-average volatility, while more conservative investors may not.

Standard deviation is one of the key fundamental risk measures that analysts, portfolio managers, and advisors use. Investment firms report the standard deviation of their mutual funds and other products. A large dispersion shows how much the return on the fund is deviating from the expected normal returns. Because it is easy to understand, this statistic is regularly reported to the end clients and investors.

Variance is derived by taking the mean of the data points, subtracting the mean from each data point individually, squaring each of these results, and then taking another mean of these squares. Standard deviation is the square root of the variance. All these calculations can be accomplished quickly using software like Excel.

The variance helps determine the data's spread size when compared to the mean value. As the variance gets bigger, more variation in data values occurs, and there may be a larger gap between one data value and another. If the data values are all close together, the variance will be smaller. However, this is more difficult to grasp than the standard deviation because variances represent a squared result that may not be meaningfully expressed on the same graph as the original dataset.

Standard deviations are usually easier to picture and apply. The standard deviation is expressed in the same unit of measurement as the data, which isn't necessarily the case with the variance. Using the standard deviation, statisticians may determine if the data has a normal curve or other mathematical relationship.

If the data behaves in a normal curve, then 68% of the data points will fall within one standard deviation of the average, or mean, data point. Larger variances cause more data points to fall outside the standard deviation. Smaller variances result in more data that is close to average.

The standard deviation is graphically depicted as a bell curve's width around the mean of a data set. The wider the curve, the larger a data set's standard deviation from the mean.

Strengths of Standard Deviation

Standard deviation is a commonly used measure of dispersion. Many analysts are probably more familiar with standard deviation than compared to other statistical calculations of data deviation. For this reason, the standard deviation is often used in a variety of situations from investing to actuaries .

Standard deviation is all-inclusive of observations. Each data point is included in the analysis. Other measurements of deviation such as range only measure the most dispersed points without consideration for the points in between. Therefore, standard deviation is often considered a more robust, accurate measurement compared to other observations.

The standard deviation of two data sets can be combined using a specific combined standard deviation formula. There are no similar formulas for other dispersion observation measurements in statistics. In addition, and unlike other means of observation, the standard deviation can be used in further algebraic computations.

Limitations of Standard Deviation

There are some downsides to consider when using standard deviation. The standard deviation does not actually measure how far a data point is from the mean. Instead, it compares the square of the differences, a subtle but notable difference from actual dispersion from the mean.

Outliers have a heavier impact on standard deviation. This is especially true considering the difference from the mean is squared, resulting in an even larger quantity compared to other data points. Therefore, be mindful that standard observation naturally gives more weight to extreme values.

Last, standard deviation can be difficult to manually calculate. As opposed to other measurements of dispersion such as range (the highest value minus the lowest value), standard deviation requires several cumbersome steps and is more likely to incur computational errors compared to easier measurements. This hurdle can be circumnavigated through the use of a Bloomberg terminal .

Consider leveraging Excel when calculating standard deviation. After entering your data, use the STDEV.S formula if your data set is numeric or the STDEVA when you want to include text or logical values. There are also several specific formulas to calculate the standard deviation for an entire population.

Example of Standard Deviation

Say we have the data points 5, 7, 3, and 7, which total 22. You would then divide 22 by the number of data points, in this case, four—resulting in a mean of 5.5. This leads to the following determinations: x̄ = 5.5 and N = 4.

The variance is determined by subtracting the mean's value from each data point, resulting in -0.5, 1.5, -2.5, and 1.5. Each of those values is then squared, resulting in 0.25, 2.25, 6.25, and 2.25. The square values are then added together, giving a total of 11, which is then divided by the value of N minus 1, which is 3, resulting in a variance of approximately 3.67.

The square root of the variance is then calculated, which results in a standard deviation measure of approximately 1.915.

Or consider shares of Apple (AAPL) for a period of five years. Historical returns for Apple’s stock were 88.97% for 2019, 82.31% for 2020, 34.65% for 2021, -26.41% for 2022 and, as of mid-April, 28.32% for 2023. The average return over the five years was thus 41.57%.

The value of each year's return less the mean were then 47.40%, 40.74%, -6.92%, -67.98%, and -15.57%, respectively. All those values are then squared to yield 22.47%, 16.60%, 0.48%, 46.21%, and 2.42%. The sum of these values is 0.882. Divide that value by 4 (N minus 1) to get the variance (0.882/4) = 0.220.

The square root of the variance is taken to obtain the standard deviation of 0.4690, or 46.90%.

What Does a High Standard Deviation Mean?

A large standard deviation indicates that there is a lot of variance in the observed data around the mean. This indicates that the data observed is quite spread out. A small or low standard deviation would indicate instead that much of the data observed is clustered tightly around the mean.

What Does Standard Deviation Tell You?

Standard deviation describes how dispersed a set of data is. It compares each data point to the mean of all data points, and standard deviation returns a calculated value that describes whether the data points are in close proximity or whether they are spread out. In a normal distribution, standard deviation tells you how far values are from the mean.

How Do You Find the Standard Deviation Quickly?

If you look at the distribution of some observed data visually, you can see if the shape is relatively skinny vs. fat. Fatter distributions have bigger standard deviations. Alternatively, Excel has built in standard deviation functions depending on the data set.

How Do You Calculate Standard Deviation?

Standard deviation is calculated as the square root of the variance. Alternatively, it is calculated by finding the mean of a data set, finding the difference of each data point to the mean, squaring the differences, adding them together, dividing by the number of points in the data set less 1, and finding the square root.

Standard deviation is important because it can help investors assess risk. Consider an investment option with an average annual return of 10% per year. However, this average was derived from the past three year returns of 50%, -15%, and -5%. By calculating the standard deviation and understanding your low likelihood of actually averaging 10% in any single given year, you're better armed to make informed decisions and recognizing underlying risk.

Netcials. " Apple Inc (AAPL) Stock 5 Years History ."

standard deviation use in analysis of research data

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

ORIGINAL RESEARCH article

Analyzing the efficacy of comprehensive testing: a comprehensive evaluation provisionally accepted.

  • 1 Qassim University, Saudi Arabia

The final, formatted version of the article will be published soon.

This study aimed to examine the variations in comprehensive exam results in the English department at Qassim University in Saudi Arabia across six semesters, focusing on average score, range, and standard deviation, as well as overall student achievements. Additionally, it sought to assess the performance levels of male and female students in comprehensive tests and determine how they differ over the past six semesters. The research design utilized both analytical and descriptive approaches, with quantitative analysis of the data using frequency statistics such as mean, standard deviation, and range. The data consisted of scores from six consecutive exit exams. The findings reveal that male students scored slightly higher on average than female students, with minimal difference (p=.07). Moreover, male scores exhibited more variability and spread, indicating varying performance levels. These results suggest the need for further investigation into the factors that contribute to gender-based differences in test performance. Furthermore, longitudinal studies tracking individual student performance over multiple semesters could offer a more in-depth understanding of academic progress and the efficacy of comprehensive exam practices.

Keywords: comprehensive testing, EFL students, Evaluation, gender differences, quantitative research

Received: 15 Nov 2023; Accepted: 09 Apr 2024.

Copyright: © 2024 Alolaywi, Alkhalaf and Almuhilib. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Yasamiyan Alolaywi, Qassim University, Buraidah, Saudi Arabia

People also looked at

ScienceDaily

Cognitive decline may be detected using network analysis, according to Concordia researchers

A novel data visualization and analytical technique shows which variables may best indicate subtle change in brain function.

We all lose our car keys or our glasses from time to time. Most people would be correct to laugh it off as a normal part of aging. But for others, cognitive decline may start as a worrying but clinically unnoticeable step toward cognitive impairment, be it relatively mild or as severe as Alzheimer's disease.

The vast complexity of the human brain makes early diagnosis of cognitive decline difficult to achieve, which has potentially important implications for treatment and prevention. This is especially true for subjective cognitive decline, in which an individual reports concerns about memory or cognitive ability but shows no deviation on cognitive tests administered by clinicians.

That's the focus of a new paper in the journal Cortex by Concordia PhD student Nicholas Grunden and Department of Psychology professor Natalie Phillips. In it they use a novel technique called network analysis to study whether it can reveal the subtle changes associated with subjective cognitive decline that cannot be detected through standard test analyses.

A network approach models cognitive performance as a web of intertwined cognitive abilities that reflects the relationships between a set of variables, or nodes. The nodes here are the results of several neuropsychological tests, as well as participant characteristics like age, sex and education.

By running a statistical analysis of data merged from two large Canadian data sets, the researchers were able to visualize the strength of relationships between the nodes among people who are classified as cognitively normal (CN), or who have diagnoses of subjective cognitive decline (SCD), mild cognitive impairment (MCI) or Alzheimer's disease (AD).

"The nodes are connected by edges, which are the conditional associations between them," Grunden says. "The edge reflects how those variables work together. Are they positively correlated or negatively correlated? The network shows us how strong these associations are by how saturated the edges are. It's a built-in visual communication of findings."

Seeing the decline

After constructing the networks using the merged databases, the researchers identified two nodes that exert the strongest influence on the rest of network: performance on tests of executive function and processing speed. Both are known to decline with age.

The strength of these two nodes, however, had marked decreases from the cognitively normal to the subjective cognitive decline to the mild cognitive impairment groups. This progressive gradient places SCD as an intermediate stage between CN and MCI.

"We found that very interesting, because it uncovered something that speaks to individuals' subjective concerns that are invisible in normal statistical analyses," Grunden explains.

"Executive function and processing speed are important cognitive abilities in that they contribute to other abilities (e.g., language, attention) and are integral to supporting an individual's day-to-day functioning in their lives. We know efficiency decreases as we age but we also see them at the initial stages of some types of progressive cognitive decline."

The researchers also noticed an important component to the role of age. While it is one of the strongest predictors of cognitive decline, and it exerted substantial influence on cognitive abilities among those classified CN and SCD, that influence waned among those classified MCI or AD. For them, other nodes measuring cognitive ability take on more weight.

"In other words, all things considered, age will be the biggest influence on cognition for older adults who do not show signs of Alzheimer's disease," says Phillips, the Concordia University Research Chair in Sensory-Cognitive Health in Aging and Dementia.

"But that is not the case in those individuals who have a diagnosis of MCI or Alzheimer's disease. For them, cognitive function is more associated with how advanced their disease is, as indicated by general measures of clinical status on standardized cognition tests like the Montreal Cognitive Assessment Test."

Grunden says network analysis can help researchers examine brain function as a system rather than isolated variables acting upon each other.

"This helps us read between the lines, because we can look at the interrelationships between all of the variables at the same time," he says. "You can pick up on indicators that are less apparent in single elements of data and instead focus on associations between them."

The Fonds de recherche du Québec -- Nature et technologies (FRQNT) , the Fondation Famille Lemaire and the Centre for Research on Brain, Language and Music contributed funding for this study. The researchers used data from the Consortium for the early identification of Alzheimer's disease -- Quebec (CIMA-Q) and the Comprehensive Assessment on Neurodegeneration and Dementia (COMPASS-ND) databases.

  • Intelligence
  • Alzheimer's
  • Neuroscience
  • Disorders and Syndromes
  • Cognitive neuroscience
  • Psycholinguistics
  • Scientific method
  • Cognitive psychology
  • Confirmation bias

Story Source:

Materials provided by Concordia University . Original written by Patrick Lejtenyi. Note: Content may be edited for style and length.

Journal Reference :

  • Nicholas Grunden, Natalie A. Phillips. A network approach to subjective cognitive decline: Exploring multivariate relationships in neuropsychological test performance across Alzheimer's disease risk states . Cortex , 2024; 173: 313 DOI: 10.1016/j.cortex.2024.02.005

Cite This Page :

Explore More

  • Pregnancy Accelerates Biological Aging
  • Tiny Plastic Particles Are Found Everywhere
  • What's Quieter Than a Fish? A School of Them
  • Do Odd Bones Belong to Gigantic Ichthyosaurs?
  • Big-Eyed Marine Worm: Secret Language?
  • Unprecedented Behavior from Nearby Magnetar
  • Soft, Flexible 'Skeletons' for 'Muscular' Robots
  • Toothed Whale Echolocation and Jaw Muscles
  • Friendly Pat On the Back: Free Throws
  • How the Moon Turned Itself Inside Out

Trending Topics

Strange & offbeat.

medRxiv

Maternal and Infant Research Electronic Data Analysis (MIREDA): A protocol for creating a common data model for federated analysis of UK birth cohorts and the life course.

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Seaborne
  • For correspondence: [email protected]
  • ORCID record for Hope Jones
  • ORCID record for Neil Cockburn
  • ORCID record for Stevo Durbaba
  • ORCID record for Tom Giles
  • ORCID record for Arturo Gonzalez-Izquierdo
  • ORCID record for Amy Hough
  • ORCID record for Dan Mason
  • ORCID record for Armando Mendez-Villalon
  • ORCID record for Carlos Sanchez-Soriano
  • ORCID record for Chris Orton
  • ORCID record for David Ford
  • ORCID record for Phillip Quinlan
  • ORCID record for Krish Nirantharakumar
  • ORCID record for Lucilla Poston
  • ORCID record for Rebecca Reynolds
  • ORCID record for Gillian Santorelli
  • ORCID record for Sinead Brophy
  • Info/History
  • Preview PDF

Introduction Birth cohorts are valuable resources for studying early life, the determinants of health, disease, and development. They are essential for studying life course. Electronic cohorts are live, dynamic longitudinal cohorts using anonymised, routinely collected data. There is no selection bias through direct recruitment, but they are limited to health and administrative system data and may lack contextual information. The MIREDA (Maternal and Infant Research Electronic Data Analysis) partnership creates a UK-wide birth cohort by aligning existing electronic birth cohorts to have the same structure, content, and vocabularies, enabling UK-wide federated analyses. Objectives 1) Create a core dynamic, live UK-wide electronic birth cohort with approximately 100,000 new births per year using a common data model (CDM). 2) Provide data linkage and automation for long-term follow up of births from MuM-PreDiCT and the Born-in initiatives of Bradford, Wales, Scotland, and South London for comparable analyses. Methods We will establish core data content and collate linkable data. Use a suite of extraction, transformation, and load (ETL) tools will be used to transform the data for each birth cohort into the CDM. Transformed datasets will remain within each cohorts trusted research environment (TRE). Metadata will be uploaded for the public to the Health Data Research (HDRUK) Innovation Gateway. We will develop a single online data access request for researchers. A cohort profile will be developed for researchers to reference the resource. Ethics Each cohort has approval from their TRE through compliance with their project application processes and information governance. Dissemination We will engage with researchers in the field to promote our resource through partnership networking, publication, research collaborations, conferences, social media, and marketing communications strategies. Keywords: Birth Cohort, Life Course Perspective, Data Science, Data Curation, Routinely Collected Health Data, Electronic Health Records, Unified Medical Language System.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by an MRC Partnership Grant [MR/X02055X/1], MatCHNet pump-priming [U20005/302873] and an MRC Programme Grant [MR/X009742/1].

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Access to data is granted according to the information governance requirements of each TRE. The Data Protection Act 2018 is not applicable to anonymised data and the OMOP CDM will be anonymised and provide aggregated data and statistics only. Each TRE has ethical approval for its operation and use, thus no additional ethical approval was required beyond the standard project approval by official channels.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

There was an error in the order of authorship and a missing punctuation mark after the title abstract. Also needed to update the authors order in the file I uploaded as it did not match that of the paper.

Data Availability

Data will be available upon reasonable request through the Health Data Research (HDRUK) Innovation Gateway.

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One
  • Addiction Medicine (316)
  • Allergy and Immunology (619)
  • Anesthesia (160)
  • Cardiovascular Medicine (2281)
  • Dentistry and Oral Medicine (280)
  • Dermatology (201)
  • Emergency Medicine (370)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (803)
  • Epidemiology (11586)
  • Forensic Medicine (10)
  • Gastroenterology (680)
  • Genetic and Genomic Medicine (3597)
  • Geriatric Medicine (337)
  • Health Economics (618)
  • Health Informatics (2309)
  • Health Policy (915)
  • Health Systems and Quality Improvement (865)
  • Hematology (335)
  • HIV/AIDS (753)
  • Infectious Diseases (except HIV/AIDS) (13167)
  • Intensive Care and Critical Care Medicine (757)
  • Medical Education (359)
  • Medical Ethics (100)
  • Nephrology (389)
  • Neurology (3358)
  • Nursing (191)
  • Nutrition (507)
  • Obstetrics and Gynecology (652)
  • Occupational and Environmental Health (647)
  • Oncology (1763)
  • Ophthalmology (526)
  • Orthopedics (209)
  • Otolaryngology (284)
  • Pain Medicine (223)
  • Palliative Medicine (66)
  • Pathology (441)
  • Pediatrics (1008)
  • Pharmacology and Therapeutics (422)
  • Primary Care Research (406)
  • Psychiatry and Clinical Psychology (3067)
  • Public and Global Health (5999)
  • Radiology and Imaging (1226)
  • Rehabilitation Medicine and Physical Therapy (715)
  • Respiratory Medicine (811)
  • Rheumatology (367)
  • Sexual and Reproductive Health (353)
  • Sports Medicine (318)
  • Surgery (389)
  • Toxicology (50)
  • Transplantation (171)
  • Urology (142)

IMAGES

  1. Examples of Standard Deviation and How It’s Used

    standard deviation use in analysis of research data

  2. Standard Deviation

    standard deviation use in analysis of research data

  3. Standard Deviation Formula and Uses vs. Variance

    standard deviation use in analysis of research data

  4. How to Calculate a Sample Standard Deviation

    standard deviation use in analysis of research data

  5. How to Calculate Standard Deviation (Guide)

    standard deviation use in analysis of research data

  6. Standard Deviation: Variation from the Mean

    standard deviation use in analysis of research data

VIDEO

  1. What is Standard Deviation? Understanding Spread in Data

  2. Mastering Standard Deviation:A Step-by-Step Guide with Example|Calculate SD Like a Pro! Math Dot Com

  3. Part 13: Standard Deviation Complete Calculations

  4. Standard deviation of grouped data (statistics)

  5. Standard Deviation of a Data Set in Excel

  6. Frequency, Mean & Standard Deviation and Other Descriptive Analysis in SPSS

COMMENTS

  1. Standard Deviation: Interpretations and Calculations

    The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values ...

  2. Standard Deviation

    The standard deviation (SD) measures the extent of scattering in a set of values, typically compared to the mean value of the set.[1][2][3] The calculation of the SD depends on whether the dataset is a sample or the entire population. Ideally, studies would obtain data from the entire target population, which defines the population parameter. However, this is rarely possible in medical ...

  3. How to Calculate Standard Deviation (Guide)

    The standard deviation is usually calculated automatically by whichever software you use for your statistical analysis. But you can also calculate it by hand to better understand how the formula works. There are six main steps for finding the standard deviation by hand. We'll use a small data set of 6 scores to walk through the steps.

  4. What is Standard Deviation Statistics and Data Analysis

    Standard deviation is crucial in statistics and data analysis for understanding the variability of a dataset. It helps identify trends, assess data reliability, detect outliers, compare datasets, and evaluate risk. A high standard deviation indicates a larger spread of values. In contrast, a low standard deviation shows that the values are more ...

  5. A beginner's guide to standard deviation and standard error

    How to calculate standard deviation. Standard deviation is rarely calculated by hand. It can, however, be done using the formula below, where x represents a value in a data set, μ represents the mean of the data set and N represents the number of values in the data set. The steps in calculating the standard deviation are as follows: For each ...

  6. What to use to express the variability of data: Standard deviation or

    SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate.

  7. Standard Deviation

    The standard deviation is a measure of the spread of scores within a set of data. Usually, we are interested in the standard deviation of a population. However, as we are often presented with data from a sample only, we can estimate the population standard deviation from a sample standard deviation.

  8. How to Interpret Standard Deviation and Standard Error in Research

    Standard deviation specifically offers a variety of insights when it comes to analysis; in business, a standard deviation might imply how risky a venture is. In manufacturing, the standard deviation might reference quality control. ... Standard deviation is a valuable research tool as it tells how spread out data is. Standard deviation is a ...

  9. Describing Data using the Mean and Standard Deviation

    A large standard deviation indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean. ... If the histogram of a data set is approximately bell-shaped, we can approximate the percentage of data between standard deviations using the empirical rule. Empirical Rule.

  10. Understanding the Difference Between Standard Deviation and Standard

    As an important aside, in a normal distribution there is a specific relationship between the mean and SD: mean ± 1 SD includes 68.3% of the population, mean ± 2 SD includes 95.5% of the population, and mean ± 3 SD includes 99.7% of the population.

  11. An Overview of the Fundamentals of Data Management, Analysis, and

    Quantitative data analysis involves the use of statistics. Descriptive statistics help summarize the variables in a data set to show what is typical for a sample. Measures of central tendency (ie, mean, median, mode), measures of spread (standard deviation), and parameter estimation measures (confidence intervals) may be calculated.

  12. Variability

    Standard deviation. The standard deviation is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is. There are six steps for finding the standard deviation by hand: List each score and find their mean.

  13. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  14. 2.4: Applications of Standard Deviation

    We start by examining a specific set of data. Table 2.4.1 2.4. 1 shows the heights in inches of 100 100 randomly selected adult men. A relative frequency histogram for the data is shown in Figure 2.4.1 2.4. 1. The mean and standard deviation of the data are, rounded to two decimal places, x¯ = 69.92 x ¯ = 69.92 and σ = 1.70 σ = 1.70.

  15. Why is Standard Deviation Important? (Explanation + Examples)

    The standard deviation is used to measure the spread of values in a sample.. We can use the following formula to calculate the standard deviation of a given sample: √ Σ(x i - x bar) 2 / (n-1). where: Σ: A symbol that means "sum" x i: The i th value in the sample; x bar: The mean of the sample; n: The sample size The higher the value for the standard deviation, the more spread out the ...

  16. Basic statistical tools in research and data analysis

    Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD). The SD of a population is ...

  17. How to Interpret Standard Deviation in a Statistical Data Set

    The standard deviation of the salaries for this team turns out to be $6,567,405; it's almost as large as the average. However, as you may guess, if you remove Kobe Bryant's salary from the data set, the standard deviation decreases because the remaining salaries are more concentrated around the mean. The standard deviation becomes $4,671,508.

  18. Calculating standard deviation step by step

    The standard deviation is a measure of how close the numbers are to the mean. If the standard deviation is big, then the data is more "dispersed" or "diverse". As an example let's take two small sets of numbers: 4.9, 5.1, 6.2, 7.8 and 1.6, 3.9, 7.7, 10.8 The average (mean) of both these sets is 6.

  19. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  20. 6 Examples of Using Standard Deviation in Real Life

    Example 3: Standard Deviation in Real Estate. Standard deviation is a metric that is used often by real estate agents. For example: Real estate agents calculate the standard deviation of house prices in a particular area so they can inform their clients of the type of variation in house prices they can expect.

  21. (PDF) Empirics of Standard Deviation

    The data are measured majorly with basic statistical tools such as mean, median and mode. To arrive at accurate measurement, the use of standard deviation is employed. Standard deviation is a ...

  22. How do I interpret the standard deviation in our research data?

    Most recent answer. In statistics, the standard deviation is a measure of the amount of variation or dispersion in a set of data. A higher standard deviation indicates that the data values are ...

  23. Standard Deviation Formula and Uses vs. Variance

    Standard deviation is a measure of the dispersion of a set of data from its mean . It is calculated as the square root of variance by determining the variation between each data point relative to ...

  24. Frontiers

    The research design utilized both analytical and descriptive approaches, with quantitative analysis of the data using frequency statistics such as mean, standard deviation, and range. The data consisted of scores from six consecutive exit exams. The findings reveal that male students scored slightly higher on average than female students, with ...

  25. Cognitive decline may be detected using network analysis, according to

    Researchers use network analysis to study whether it can reveal the subtle changes associated with subjective cognitive decline that cannot otherwise be detected through standard test analyses. By ...

  26. Maternal and Infant Research Electronic Data Analysis (MIREDA): A

    Introduction Birth cohorts are valuable resources for studying early life, the determinants of health, disease, and development. They are essential for studying life course. Electronic cohorts are live, dynamic longitudinal cohorts using anonymised, routinely collected data. There is no selection bias through direct recruitment, but they are limited to health and administrative system data and ...

  27. Data Assimilation of Ideally Expanded Supersonic Jet Using RANS ...

    Data assimilation using particle image velocimetry (PIV) and Reynolds-averaged Navier-Stokes (RANS) simulation was performed for an ideally expanded supersonic jet flying at a Mach number of 2.0. The present study aims to efficiently reconstruct all the physical quantities in the aeroacoustic fields that match well with a realistic, experimentally obtained flow field. The two-dimensional ...