Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 15, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Standard Error In Statistics: What It Is, Why It Matters, & How to Calculate

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

When you take samples from a population and calculate the means of the samples, these means will be arranged into a distribution around the true population mean.

The standard deviation of this distribution of sampling means is known as the standard error.

Large vs. Small Standard Error

Standard error estimates how accurately the mean of any given sample represents the true mean of the population.

A larger standard error indicates that the means are more spread out, and thus it is more likely that your sample mean is an inaccurate representation of the true population mean.

On the other hand, a smaller standard error indicates that the means are closer together. Thus it is more likely that your sample mean is an accurate representation of the true population mean.

The standard error increases when the standard deviation increases. Standard error decreases when sample size increases because having more data yields less variation in your results.

Standard error formula

SE = standard error of the sample

σ  = sample standard deviation

n  = number of samples

How to Calculate

Standard error is calculated by dividing the standard deviation of the sample by the square root of the sample size.
  • Calculate the mean of the total population.
  • Calculate each measurement’s deviation from the mean.
  • Square each deviation from the mean.
  • Add the squared deviations from Step 3.
  • Divide the sum of the squared deviations by one less than the sample size (n-1).
  • Calculate the square root of the value obtained from Step 5. This result gives you the standard deviation.
  • Divide the standard deviation by the square root of the sample size (n). This result gives you the standard error.
  • Subtracting the standard error from the mean / adding the standard error to the mean will give the mean ± 1 standard error.

The values in your sample are 52, 60, 55, and 65.

  • Calculate the mean of these values by adding them together and dividing by 4. (52 + 60 + 55 + 65)/4 = 58 (Step 1).
  • Next, calculate the sum of the squared deviations of each sample value from the mean (Steps 2-4).
  • Using the values in this example, the squared deviations are (58 – 52)^2= 36, (58 – 60)^2= 4, (58 – 55)^2=9, and (58 – 65)^2=49. Therefore, the sum of the squared deviations is 98 (36 + 4 + 9 + 49).
  • Next, divide the sum of the squared deviations by the sample size minus one and take the square root (Steps 5-6). The standard deviation in this example is the square root of [98 / (4 – 1)], which is about 5.72.
  • Lastly, divide the standard deviation, 5.72, by the square root of the sample size, 4 (Step 7). The resulting value is 2.86, which gives the standard error of the values in this example.

1. What is the standard error?

The standard error is a statistical term that measures the accuracy with which a sample distribution represents a population by using the standard deviation of the sample mean.

2. What is a good standard error?

Determining a “good” standard error can be context-dependent. As a general rule, a smaller standard error is better because it suggests your sample mean is a reliable estimate of the population mean. However, what constitutes as “small” can depend on the scale of your data and the size of your sample.

3. What does standard error tell you?

The standard error measures how spread out the means of different samples would be if you were to perform your study or experiment many times. A lower SE would indicate that most sample means cluster tightly around the population mean, while a higher SE indicates that the sample means are spread out over a wider range.

It’s used to construct confidence intervals for the mean and hypothesis testing .

4. When should Standard Error be used?

We use the standard error to indicate the uncertainty around the estimate of the mean measurement. It tells us how well our sample data represents the whole population. This is useful when we want to calculate a confidence interval.

5. What is the Difference between Standard Error and Standard Deviation?

Standard error and standard deviation are both measures of variability, but the standard deviation is a descriptive statistic that can be calculated from sample data, while standard error is an inferential statistic that can only be estimated.

Standard deviation tells us how concentrated the data is around the mean. It describes variability within a single sample. On the other hand, the standard error tells us how the mean itself is distributed.

It estimates the variability across multiple samples of a population. The formula for standard error calculates the standard deviation divided by the square root of the sample size.

Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. Bmj, 331 (7521), 903.

Zwillinger, D. (2018). CRC standard mathematical tables and formulas. chapman and hall/CRC.

Print Friendly, PDF & Email

Hypothesis Testing

  • First Online: 26 April 2022

Cite this chapter

Book cover

  • Melissa Whatley   ORCID: orcid.org/0000-0002-7073-6772 2  

Part of the book series: Springer Texts in Education ((SPTE))

539 Accesses

Drawing from the properties of the normal distribution, this chapter introduces key concepts in hypothesis testing, including the Central Limit Theorem, the sampling distribution, the expected value of the mean, and the estimated standard error. These concepts are illustrated through three types of t-tests: one-sample t-test, two-samples t-tests, and dependent samples t-tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The full details of the Central Limit Theorem are beyond the scope of this book, but I encourage you to explore them further in a more advanced statistics textbook, such as those listed in the recommended reading list at the end of this chapter.

Note here that we also assume that our sample is valid.

When I teach Type 1 and Type 2 error, I often draw a parallel to my dogs, Taca, Bernice, Lola, and Fritsi, barking at the front door of my house. Sometimes, they bark at the front door even when no one is there (perhaps there are ghosts in my neighborhood?). This is a Type 1 error. They detect an effect (effect = someone at the door) that does not exist in reality. Other times, they fail to bark at the front door even when there is someone there (a couple weeks ago, I ordered pizza delivery for dinner, and they did not even notice when the delivery person arrived to drop the pizza off). This is a Type 2 error. They do not detect an effect even though it exists in reality.

Technically, this formula is only appropriate when you do not know the population standard deviation and have to estimate the standard error using the sample standard deviation, an assumption we make throughout this chapter.

These are not the only additional mean-comparison hypothesis tests, but they are likely the most useful for readers of this book.

The formula for the estimated standard error of the difference between two independent sample means when sample sizes are not equal is considerably more complex:

\(s_{\overline{x}1 - \overline{x}2} = \sqrt {\frac{SS_1 + SS_2 }{{n_1 + n_2 - 2}}\left( {\frac{1}{n_1 } + \frac{1}{n_2 }} \right)}\)

This formula also assumes that population standard deviations are equal. Here, \(SS_1\) and \(SS_2\) refer to the sum of squared deviations for each sample, while \(n_1\) and \(n_2\) are the two sample sizes. Essentially, what this formula does is weight the standard error to account for the fact that the sample sizes are unequal. Differences in sample sizes are especially problematic when one sample’s standard error is dramatically different from the standard error of the other sample (a violation of the homogeneity of variance assumption). This can lead to very misleading results. When you calculate a two-samples t- test using a statistical software program, this adjustment for sample size, if needed, is made for you automatically. It also is possible to compute a standard error that allows for unequal variances between samples.

Recommended Reading

A deeper dive.

Urdan, T. C. (2017a). Standard errors. In Statistics in plain English (pp. 57–72). Routledge.

Google Scholar  

Urdan, T. C. (2017b). Statistical significance, effect size, and confidence intervals. In Statistics in plain English (pp. 73–92). Routledge.

Urdan, T. C. (2017c). t Tests. In Statistics in plain English (pp. 73–112). Routledge.

Wheelan, C. (2013a). Basic probability: Don’t buy the extended warranty on your $99 printer. In Naked statistics: Stripping the dread from data (pp. 68–89). Norton.

Wheelan, C. (2013b). The Monty Hall problem. In Naked statistics: Stripping the dread from data (pp. 90–94). Norton.

Wheelan, C. (2013c). Problems with probability: How overconfident math geeks nearly destroyed the global financial system. In Naked statistics: Stripping the dread from data (pp.95–109). Norton.

Wheelan, C. (2013d). The central limit theorem. In Naked statistics: Stripping the dread from data (pp. 127–142). Norton.

Additional Examples

Cartwright, C., Stevens, M., & Schneider, K. (2021). Constructing the learning outcomes with intercultural assessment: A 3-year study of a graduate study abroad and glocal experience programs. Frontiers: The Interdisciplinary Journal of Study Abroad ,  33 (1), 82–105.

Echcharfy, M. (2020). Intercultural learning in Moroccan higher education: A comparison between teachers’ perceptions and students’ expectations. International Journal of ResEarch in English Education, 5 (1), 19–35.

Article   Google Scholar  

Yang, L., Borrowman, L., Tan, M. Y., & New, J. Y. (2020). Expectations in transition: Students’ and teachers’ expectations of university in an international branch campus. Journal of Studies in International Education, 24 (3), 352–370.

Download references

Author information

Authors and affiliations.

School for International Training, Brattleboro, VT, USA

Melissa Whatley

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Melissa Whatley .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Whatley, M. (2022). Hypothesis Testing. In: Introduction to Quantitative Analysis for International Educators. Springer Texts in Education. Springer, Cham. https://doi.org/10.1007/978-3-030-93831-4_4

Download citation

DOI : https://doi.org/10.1007/978-3-030-93831-4_4

Published : 26 April 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-93830-7

Online ISBN : 978-3-030-93831-4

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Comprehensive Learning Paths
  • 150+ Hours of Videos
  • Complete Access to Jupyter notebooks, Datasets, References.

Rating

Standard Error in Statistics – Understanding the concept, formula and how to calculate

  • October 4, 2020
  • Selva Prabhakaran

Standard error of the mean measures how spread out the means of the sample can be from the actual population mean. Standard error allows you to build a relationship between a sample statistic (computed from a smaller sample of the population) and the population’s actual parameter.

hypothesis test using standard error

What is Standard Error?

The sample error serves as a means to understand the actual population parameter (like population mean) without actually estimating it.

Consider the following scenario:

A researcher ‘X’ is collecting data from a large population of voters. For practical reasons he can’t reach out to each and every voter. So, only a small randomized sample (of voters) is selected for data collection.

Once the data for the sample is collected, you calculate the mean (or any statistic) of that sample. But then, this mean you just computed is only the sample mean. It cannot be considered the entire population’s mean. You can however expect it to be somewhere close to population’s mean.

So how can you know the actual population mean?

While its not possible to compute the exactly value, you can use standard error to estimate how far the sample mean may spread from the actual population mean.

To be more precise,

The Standard Error of the Mean describes how far a sample mean may vary from the population mean .

In this post, you will understand clearly:

  • What Standard Error Tells Us?
  • What is the Sample Error Formula?
  • How to calculate Standard Error?
  • How to use standard error to compute confidence interval?
  • Example Problem and solution

How to understand Standard Error?

Let’s first clearly understand the intuition behind and the need for standard error.

Now, let’s suppose you are working in agriculture domain and you want to know the annual yield of a particular variety of coconut trees. While the entire population of coconut trees has a certain mean (and standard deviation) of annual yield, it is not practical to take measurements of each and every tree out there.

So, what do you do?

To estimate this you collect samples of coconut yield (number of nuts per tree per year) from different trees. And to keep your findings unbiased, you collect samples across different places.

hypothesis test using standard error

Let’s say, you collected data from approx ~5 trees per sample from different places and the numbers are shown below.

In above data, the variables sample1 , sample2 and sample3 contain the samples of annual yield values collected, where each number represents the yield of one individual tree.

Observe that the yield varies not just across the trees, but also across the different samples.

Image showing sampling distribution to explain standard error

Although we compute means of the samples, we are actually not interested in the means of the sample, but the overall mean annual yield of coconut of this variety.

Now, you may ask: ‘ Why can’t we just put the values from all these samples in one bucket and simply compute the mean and standard deviation and consider that as the population’s parameter? ‘

Well, the problem is, if you do that practically what happens is, as you receive few more samples, the real population’s parameter begins to come out which is likely to be (slightly) different from the parameter you computed earlier.

Below is a code demo.

As you add more and more samples, the computed parameters keep changing.

So how to tackle this?

If you notice, each sample has its own mean that varies between a particular range . This mean (of the sample) has its own standard deviation. This measure of standard deviation of the mean is called the standard error of the mean .

Its important to note, it is different from the standard deviation of the data. The difference is, while standard deviation tells you how the overall data is distributed around the mean, the standard error tells you how the mean itself is distributed .

This way, it can be used to generalize the sample mean so it can be used as an estimate of the whole population.

In fact, standard error can be generalized to any statistic like standard deviation, median etc. For example, if you compute the standard deviation of the standard deviations (of the samples), it is called, standard error of the standard deviation . Feels like a tongue twister. But most commonly, when someone mention ‘Standard error’ it typically refers to the ‘Standard error of of the mean’.

What is the Formula?

To calculate standard error, you simply divide the standard deviation of a given sample by the square root of the total number of items in the sample.

$$SE_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$$

where, $SE_{\bar{x}}$ is the standard error of the mean, $\sigma$ is the standard deviation of the sample and n is the number of items in sample.

Do not confuse this with standard deviation. Because standard error of the sample statistic (like mean) is typically much smaller than the population standard deviation.

Notice few things here:

  • The Standard error depends on the number of items in the sample . As you increase the number of items in the sample, lower will be the standard error and more certain you will be about the estimates.
  • It uses statistics (standard deviation and number of items) computed from the sample itself , and not of the population. That is, you don’t need to know the population parameters beforehand to compute standard error. This makes it pretty convenient.
  • Standard error can also be used as an estimate of how representative a given sample is of a population . The smaller the value, more representative is the sample of the whole population.

Below is a computation for the standard error of the mean:

How to calculate standard error?

Problem Statement

A school aptitude test for 15 year old students studying in a particular territory’s curriculum, is designed to have a mean score of 80 units and a standard deviation of 10 units. A sample of 15 answer papers has a mean score of 85. Can we assume that these 15 scores come from the designated population?

Our task is to determine if this sample comes from the above mentioned population.

How to solve this?

We approach this problem by computing the standard error of the sample means and use it to compute the confidence interval between which the sample means are expected to fall.

If the given sample mean falls inside this interval, then its safe to assume that the sample comes from the given population.

Time to get into the math.

Using standard error to compute confidence interval

Standard error is often used to compute confidence intervals

We know, n = 15, x_bar = 85, σ = 10

$$SE_\bar{x} = \frac{\sigma}{\sqrt{n}} = \frac{10}{\sqrt{15}} = 2.581$$

From a property of normal distribution , we can say with 95% confidence level that the sample means are expected to lie within a confidence interval of plus or minus two standard errors of the sample statistic from the population parameter.

But where did ‘normal distribution’ come from? You may wonder how we can directly assume a normal distribution is followed in this case. Or rather shouldn’t we test if the sample follows a normal distribution first before computing the confidence intervals.

Well, that’s NOT required. Because, the Central Limit Theorem tells us that even if a population is not normally distributed, a collection of sample means from that population will infact follow a normal distribution. So, its a valid assumption.

Back to the problem, let’s compute the confidence intervals for 95% Confidence Level.

  • Lower Limit : 80 - (2*2.581) = 74.838
  • Upper Limit : 80 + (2*2.581) = 85.162

So, 95% of our 15 item sample means are expected to fall between 74.838 and 85.162.

Since the sample mean of 85.0 lies within the computed range, there is no reason to believe that the sample does not belong to the population.

Standard error is a commonly used term that we sometimes ignore to fully understand its significance. I hope the concept is clear and you can now relate how you can use standard error in appropriate situations.

Next topic: Confidence Interval

More Articles

Correlation – connecting the dots, the role of correlation in data analysis, hypothesis testing – a deep dive into hypothesis testing, the backbone of statistical inference, sampling and sampling distributions – a comprehensive guide on sampling and sampling distributions, law of large numbers – a deep dive into the world of statistics, central limit theorem – a deep dive into central limit theorem and its significance in statistics, skewness and kurtosis – peaks and tails, understanding data through skewness and kurtosis”, similar articles, complete introduction to linear regression in r, how to implement common statistical significance tests and find the p value, logistic regression – a complete tutorial with examples in r.

Subscribe to Machine Learning Plus for high value data science content

© Machinelearningplus. All rights reserved.

hypothesis test using standard error

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free sample videos:.

hypothesis test using standard error

MBA Knowledge Base

Business • Management • Technology

Home » Research Methodology » Standard Error in Hypothesis Testing

Standard Error in Hypothesis Testing

The standard error is an indispensable tool in the kit of a researcher, because it is used in testing the validity of statistical hypothesis. The standard deviation of the sampling distribution of a statistic is called the standard error . The standard error is important in dealing with statistics (measures of samples) which are normally distributed.

Here the use of the word “error” is justified in this connection by the fact that we usually regard the expected value to be true value and the divergence from it as error of estimation due to sampling fluctuations. The term standard error has a wider meaning than merely the standard deviation of simple sampling because of the following reasons;

Standard Error in Hypothesis Testing

  • The standard error may fairly be taken to measure the unreliability of the sample estimate. The greater the standard error the greater the difference between observed and expected values and greater the unreliability of the sample estimate. On the other hand, the smaller value of the standard error, the smaller the difference between observed and expected values and greater the reliability of the sample estimate.    The reciprocal of the standard error may be regarded as a measure of reliability. The reliability of an observed proportion varies as the square root of the number of observations on which it is based. The larger the sample size, the smaller the standard error and the smaller the sample size, the larger the standard error. In case, we want to double the precision the number of observations in the sample should be increased four times.
  • The most important use of the standard errors is in the construction of confidence intervals within which parameter values are expected to fall. For a sufficiently large sample size, the sampling distribution tends to be normal. Therefore at 5% level of significance, the population mean is expected to lie within the interval of: X + 1.96 standard error of mean. Similarly at 1% level of significance, the population mean is expected to fall within the interval of: X + 2.58 standard error of mean. In the same way, other confidence intervals may be constructed.

Related Posts:

  • Descriptive research and it's methods
  • The Literature Review in Research
  • Secondary Data Sources for Research
  • Sampling Methods in Research
  • Simple Random Sampling in Research
  • Pre-Testing Research Data Collection Instruments
  • Contents and Layout of Research Report
  • Research Design
  • Primary stages of research process
  • Sampling Errors in Research

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

hypothesis test using standard error

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.2 hypothesis testing (p-value approach).

The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or equal to) \(\alpha\), then it is "unlikely." And, if the P -value is large, say more than \(\alpha\), then it is "likely."

If the P -value is less than (or equal to) \(\alpha\), then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the P -value is greater than \(\alpha\), then the null hypothesis is not rejected.

Specifically, the four steps involved in using the P -value approach to conducting any hypothesis test are:

  • Specify the null and alternative hypotheses.
  • Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. Again, to conduct the hypothesis test for the population mean μ , we use the t -statistic \(t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}\) which follows a t -distribution with n - 1 degrees of freedom.
  • Using the known distribution of the test statistic, calculate the P -value : "If the null hypothesis is true, what is the probability that we'd observe a more extreme test statistic in the direction of the alternative hypothesis than we did?" (Note how this question is equivalent to the question answered in criminal trials: "If the defendant is innocent, what is the chance that we'd observe such extreme criminal evidence?")
  • Set the significance level, \(\alpha\), the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P -value to \(\alpha\). If the P -value is less than (or equal to) \(\alpha\), reject the null hypothesis in favor of the alternative hypothesis. If the P -value is greater than \(\alpha\), do not reject the null hypothesis.

Example S.3.2.1

Mean gpa section  .

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * equaling 2.5. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right Tailed

The P -value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the probability that we would observe a test statistic greater than t * = 2.5 if the population mean \(\mu\) really were 3. Recall that probability equals the area under the probability curve. The P -value is therefore the area under a t n - 1 = t 14 curve and to the right of the test statistic t * = 2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t-distrbution graph showing the right tail beyond a t value of 2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than \(\alpha\) = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ > 3 if we lowered our willingness to make a Type I error to \(\alpha\) = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

Left Tailed

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the probability that we would observe a test statistic less than t * = -2.5 if the population mean μ really were 3. The P -value is therefore the area under a t n - 1 = t 14 curve and to the left of the test statistic t* = -2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t distribution graph showing left tail below t value of -2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ < 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean μ really was 3. That is, the two-tailed test requires taking into account the possibility that the test statistic could fall into either tail (hence the name "two-tailed" test). The P -value is, therefore, the area under a t n - 1 = t 14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually.

t-distribution graph of two tailed probability for t values of -2.5 and 2.5

Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests. The P -value, 0.0254, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0254, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ ≠ 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0254, is then greater than \(\alpha\) = 0.01.

Now that we have reviewed the critical value and P -value approach procedures for each of the three possible hypotheses, let's look at three new examples — one of a right-tailed test, one of a left-tailed test, and one of a two-tailed test.

The good news is that, whenever possible, we will take advantage of the test statistics and P -values reported in statistical software, such as Minitab, to conduct our hypothesis tests in this course.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

8.6: Hypothesis Test of a Single Population Mean with Examples

  • Last updated
  • Save as PDF
  • Page ID 130297

Steps for performing Hypothesis Test of a Single Population Mean

Step 1: State your hypotheses about the population mean. Step 2: Summarize the data. State a significance level. State and check conditions required for the procedure

  • Find or identify the sample size, n, the sample mean, \(\bar{x}\) and the sample standard deviation, s .

The sampling distribution for the one-mean test statistic is, approximately, T- distribution if the following conditions are met

  • Sample is random with independent observations .
  • Sample is large. The population must be Normal or the sample size must be at least 30.

Step 3: Perform the procedure based on the assumption that \(H_{0}\) is true

  • Find the Estimated Standard Error: \(SE=\frac{s}{\sqrt{n}}\).
  • Compute the observed value of the test statistic: \(T_{obs}=\frac{\bar{x}-\mu_{0}}{SE}\).
  • Check the type of the test (right-, left-, or two-tailed)
  • Find the p-value in order to measure your level of surprise.

Step 4: Make a decision about \(H_{0}\) and \(H_{a}\)

  • Do you reject or not reject your null hypothesis?

Step 5: Make a conclusion

  • What does this mean in the context of the data?

The following examples illustrate a left-, right-, and two-tailed test.

Example \(\pageindex{1}\).

\(H_{0}: \mu = 5, H_{a}: \mu < 5\)

Test of a single population mean. \(H_{a}\) tells you the test is left-tailed. The picture of the \(p\)-value is as follows:

Normal distribution curve of a single population mean with a value of 5 on the x-axis and the p-value points to the area on the left tail of the curve.

Exercise \(\PageIndex{1}\)

\(H_{0}: \mu = 10, H_{a}: \mu < 10\)

Assume the \(p\)-value is 0.0935. What type of test is this? Draw the picture of the \(p\)-value.

left-tailed test

alt

Example \(\PageIndex{2}\)

\(H_{0}: \mu \leq 0.2, H_{a}: \mu > 0.2\)

This is a test of a single population proportion. \(H_{a}\) tells you the test is right-tailed . The picture of the p -value is as follows:

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

Exercise \(\PageIndex{2}\)

\(H_{0}: \mu \leq 1, H_{a}: \mu > 1\)

Assume the \(p\)-value is 0.1243. What type of test is this? Draw the picture of the \(p\)-value.

right-tailed test

alt

Example \(\PageIndex{3}\)

\(H_{0}: \mu = 50, H_{a}: \mu \neq 50\)

This is a test of a single population mean. \(H_{a}\) tells you the test is two-tailed . The picture of the \(p\)-value is as follows.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

Exercise \(\PageIndex{3}\)

\(H_{0}: \mu = 0.5, H_{a}: \mu \neq 0.5\)

Assume the p -value is 0.2564. What type of test is this? Draw the picture of the \(p\)-value.

two-tailed test

alt

Full Hypothesis Test Examples

Example \(\pageindex{4}\).

Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71. He performs a hypothesis test using a 5% level of significance. The data are assumed to be from a normal distribution.

Set up the hypothesis test:

A 5% level of significance means that \(\alpha = 0.05\). This is a test of a single population mean .

\(H_{0}: \mu = 65  H_{a}: \mu > 65\)

Since the instructor thinks the average score is higher, use a "\(>\)". The "\(>\)" means the test is right-tailed.

Determine the distribution needed:

Random variable: \(\bar{X} =\) average score on the first statistics test.

Distribution for the test: If you read the problem carefully, you will notice that there is no population standard deviation given . You are only given \(n = 10\) sample data values. Notice also that the data come from a normal distribution. This means that the distribution for the test is a student's \(t\).

Use \(t_{df}\). Therefore, the distribution for the test is \(t_{9}\) where \(n = 10\) and \(df = 10 - 1 = 9\).

The sample mean and sample standard deviation are calculated as 67 and 3.1972 from the data.

Calculate the \(p\)-value using the Student's \(t\)-distribution:

\[t_{obs} = \dfrac{\bar{x}-\mu_{\bar{x}}}{\left(\dfrac{s}{\sqrt{n}}\right)}=\dfrac{67-65}{\left(\dfrac{3.1972}{\sqrt{10}}\right)}\]

Use the T-table or Excel's t_dist() function to find p-value:

\(p\text{-value} = P(\bar{x} > 67) =P(T >1.9782 )= 1-0.9604=0.0396\)

Interpretation of the p -value: If the null hypothesis is true, then there is a 0.0396 probability (3.96%) that the sample mean is 65 or more.

Normal distribution curve of average scores on the first statistic tests with 65 and 67 values on the x-axis. A vertical upward line extends from 67 to the curve. The p-value points to the area to the right of 67.

Compare \(\alpha\) and the \(p-\text{value}\):

Since \(α = 0.05\) and \(p\text{-value} = 0.0396\). \(\alpha > p\text{-value}\).

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\).

This means you reject \(\mu = 65\). In other words, you believe the average test score is more than 65.

Conclusion: At a 5% level of significance, the sample data show sufficient evidence that the mean (average) test score is more than 65, just as the math instructor thinks.

The \(p\text{-value}\) can easily be calculated.

Put the data into a list. Press STAT and arrow over to TESTS . Press 2:T-Test . Arrow over to Data and press ENTER . Arrow down and enter 65 for \(\mu_{0}\), the name of the list where you put the data, and 1 for Freq: . Arrow down to \(\mu\): and arrow over to \(> \mu_{0}\). Press ENTER . Arrow down to Calculate and press ENTER . The calculator not only calculates the \(p\text{-value}\) (p = 0.0396) but it also calculates the test statistic ( t -score) for the sample mean, the sample mean, and the sample standard deviation. \(\mu > 65\) is the alternative hypothesis. Do this set of instructions again except arrow to Draw (instead of Calculate ). Press ENTER . A shaded graph appears with \(t = 1.9781\) (test statistic) and \(p = 0.0396\) (\(p\text{-value}\)). Make sure when you use Draw that no other equations are highlighted in \(Y =\) and the plots are turned off.

Exercise \(\PageIndex{4}\)

It is believed that a stock price for a particular company will grow at a rate of $5 per week with a standard deviation of $1. An investor believes the stock won’t grow as quickly. The changes in stock price is recorded for ten weeks and are as follows: $4, $3, $2, $3, $1, $7, $2, $1, $1, $2. Perform a hypothesis test using a 5% level of significance. State the null and alternative hypotheses, find the p -value, state your conclusion, and identify the Type I and Type II errors.

  • \(H_{0}: \mu = 5\)
  • \(H_{a}: \mu < 5\)
  • \(p = 0.0082\)

Because \(p < \alpha\), we reject the null hypothesis. There is sufficient evidence to suggest that the stock price of the company grows at a rate less than $5 a week.

  • Type I Error: To conclude that the stock price is growing slower than $5 a week when, in fact, the stock price is growing at $5 a week (reject the null hypothesis when the null hypothesis is true).
  • Type II Error: To conclude that the stock price is growing at a rate of $5 a week when, in fact, the stock price is growing slower than $5 a week (do not reject the null hypothesis when the null hypothesis is false).

Example \(\PageIndex{5}\)

The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass.

1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98 1.02; .95; .95

Is there convincing evidence that the average conductivity of this type of glass is greater than one? Use a significance level of 0.05. Assume the population is normal.

Let’s follow a four-step process to answer this statistical question.

  • \(H_{0}: \mu \leq 1\)
  • \(H_{a}: \mu > 1\)
  • Plan : We are testing a sample mean without a known population standard deviation. Therefore, we need to use a Student's-t distribution. Assume the underlying population is normal.
  • Do the calculations : \(p\text{-value} ( = 0.036)\)

4. State the Conclusions : Since the \(p\text{-value} (= 0.036)\) is less than our alpha value, we will reject the null hypothesis. It is reasonable to state that the data supports the claim that the average conductivity level is greater than one.

The hypothesis test itself has an established process. This can be summarized as follows:

  • Determine \(H_{0}\) and \(H_{a}\). Remember, they are contradictory.
  • Determine the random variable.
  • Determine the distribution for the test.
  • Draw a graph, calculate the test statistic, and use the test statistic to calculate the \(p\text{-value}\). (A t -score is an example of test statistics.)
  • Compare the preconceived α with the p -value, make a decision (reject or do not reject H 0 ), and write a clear conclusion using English sentences.

Notice that in performing the hypothesis test, you use \(\alpha\) and not \(\beta\). \(\beta\) is needed to help determine the sample size of the data that is used in calculating the \(p\text{-value}\). Remember that the quantity \(1 – \beta\) is called the Power of the Test . A high power is desirable. If the power is too low, statisticians typically increase the sample size while keeping α the same.If the power is low, the null hypothesis might not be rejected when it should be.

  • Data from Amit Schitai. Director of Instructional Technology and Distance Learning. LBCC.
  • Data from Bloomberg Businessweek . Available online at www.businessweek.com/news/2011- 09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.html.
  • Data from energy.gov. Available online at http://energy.gov (accessed June 27. 2013).
  • Data from Gallup®. Available online at www.gallup.com (accessed June 27, 2013).
  • Data from Growing by Degrees by Allen and Seaman.
  • Data from La Leche League International. Available online at www.lalecheleague.org/Law/BAFeb01.html.
  • Data from the American Automobile Association. Available online at www.aaa.com (accessed June 27, 2013).
  • Data from the American Library Association. Available online at www.ala.org (accessed June 27, 2013).
  • Data from the Bureau of Labor Statistics. Available online at http://www.bls.gov/oes/current/oes291111.htm .
  • Data from the Centers for Disease Control and Prevention. Available online at www.cdc.gov (accessed June 27, 2013)
  • Data from the U.S. Census Bureau, available online at quickfacts.census.gov/qfd/states/00000.html (accessed June 27, 2013).
  • Data from the United States Census Bureau. Available online at www.census.gov/hhes/socdemo/language/.
  • Data from Toastmasters International. Available online at http://toastmasters.org/artisan/deta...eID=429&Page=1 .
  • Data from Weather Underground. Available online at www.wunderground.com (accessed June 27, 2013).
  • Federal Bureau of Investigations. “Uniform Crime Reports and Index of Crime in Daviess in the State of Kentucky enforced by Daviess County from 1985 to 2005.” Available online at http://www.disastercenter.com/kentucky/crime/3868.htm (accessed June 27, 2013).
  • “Foothill-De Anza Community College District.” De Anza College, Winter 2006. Available online at research.fhda.edu/factbook/DA...t_da_2006w.pdf.
  • Johansen, C., J. Boice, Jr., J. McLaughlin, J. Olsen. “Cellular Telephones and Cancer—a Nationwide Cohort Study in Denmark.” Institute of Cancer Epidemiology and the Danish Cancer Society, 93(3):203-7. Available online at http://www.ncbi.nlm.nih.gov/pubmed/11158188 (accessed June 27, 2013).
  • Rape, Abuse & Incest National Network. “How often does sexual assault occur?” RAINN, 2009. Available online at www.rainn.org/get-information...sexual-assault (accessed June 27, 2013).
  • Machine Learning Tutorial
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
  • Data Analysis with Python

Introduction to Data Analysis

  • What is Data Analysis?
  • Data Analytics and its type
  • How to Install Numpy on Windows?
  • How to Install Pandas in Python?
  • How to Install Matplotlib on python?
  • How to Install Python Tensorflow in Windows?

Data Analysis Libraries

  • Pandas Tutorial
  • NumPy Tutorial - Python Library
  • Data Analysis with SciPy
  • Introduction to TensorFlow

Data Visulization Libraries

  • Matplotlib Tutorial
  • Python Seaborn Tutorial
  • Plotly tutorial
  • Introduction to Bokeh in Python

Exploratory Data Analysis (EDA)

  • Univariate, Bivariate and Multivariate data and its analysis
  • Measures of Central Tendency in Statistics
  • Measures of spread - Range, Variance, and Standard Deviation
  • Interquartile Range and Quartile Deviation using NumPy and SciPy
  • Anova Formula
  • Skewness of Statistical Data
  • How to Calculate Skewness and Kurtosis in Python?
  • Difference Between Skewness and Kurtosis
  • Histogram | Meaning, Example, Types and Steps to Draw
  • Interpretations of Histogram
  • Quantile Quantile plots
  • What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?
  • Using pandas crosstab to create a bar plot
  • Exploring Correlation in Python
  • Mathematics | Covariance and Correlation
  • Factor Analysis | Data Analysis
  • Data Mining - Cluster Analysis
  • MANOVA Test in R Programming
  • Python - Central Limit Theorem
  • Probability Distribution Function
  • Probability Density Estimation & Maximum Likelihood Estimation
  • Exponential Distribution in R Programming - dexp(), pexp(), qexp(), and rexp() Functions
  • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
  • Poisson Distribution - Definition, Formula, Table and Examples
  • P-Value: Comprehensive Guide to Understand, Apply, and Interpret
  • Z-Score in Statistics
  • How to Calculate Point Estimates in R?
  • Confidence Interval
  • Chi-square test in Machine Learning

Understanding Hypothesis Testing

Data preprocessing.

  • ML | Data Preprocessing in Python
  • ML | Overview of Data Cleaning
  • ML | Handling Missing Values
  • Detect and Remove the Outliers using Python

Data Transformation

  • Data Normalization Machine Learning
  • Sampling distribution Using Python

Time Series Data Analysis

  • Data Mining - Time-Series, Symbolic and Biological Sequences Data
  • Basic DateTime Operations in Python
  • Time Series Analysis & Visualization in Python
  • How to deal with missing values in a Timeseries in Python?
  • How to calculate MOVING AVERAGE in a Pandas DataFrame?
  • What is a trend in time series?
  • How to Perform an Augmented Dickey-Fuller Test in R
  • AutoCorrelation

Case Studies and Projects

  • Top 8 Free Dataset Sources to Use for Data Science Projects
  • Step by Step Predictive Analysis - Machine Learning
  • 6 Tips for Creating Effective Data Visualizations

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

\mu

Key Terms of Hypothesis Testing

\alpha

  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing. 

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

\mu \geq 50

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

\mu =

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

\alpha

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

H_0

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

\alpha

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

p\leq\alpha

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}

  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

t=\frac{x̄-μ}{s/\sqrt{n}}

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

  • i,j are the rows and columns index respectively.

E_{ij}

Real life Hypothesis Testing example

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Hypothesis Testing

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

(203.8 - 200) / (5 \div \sqrt{25})

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Limitations of Hypothesis Testing

  • Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
  • The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
  • Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( ): No effect or difference exists. Alternative Hypothesis ( ): An effect or difference exists. Significance Level ( ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

Similar reads.

  • data-science
  • Data Science
  • Machine Learning
  • How to Organize Your Digital Files with Cloud Storage and Automation
  • 10 Best Blender Alternatives for 3D Modeling in 2024
  • How to Transfer Photos From iPhone to iPhone
  • What are Tiktok AI Avatars?
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

8.4: Hypothesis Test on a Single Standard Deviation

  • Last updated
  • Save as PDF
  • Page ID 25676

A test of a single standard deviation assumes that the underlying distribution is normal . The null and alternative hypotheses are stated in terms of the population standard deviation (or population variance). The test statistic is:

\[\chi^{2} = \frac{(n-1)s^{2}}{\sigma^{2}} \label{test}\]

  • \(n\) is the the total number of data
  • \(s^{2}\) is the sample variance
  • \(\sigma^{2}\) is the population variance

The requirements to be able to perform a hypothesis test on a population standard deviation are:

  • the sample must be obtained from a simple random sample or from a randomized experiment
  • the population has a normal distribution

You may think of \(s\) as the random variable in this test. The number of degrees of freedom is \(df = n - 1\). A test of a single standard deviation may be right-tailed, left-tailed, or two-tailed. The next example will show you how to set up the null and alternative hypotheses. The null and alternative hypotheses contain statements about the population variance.

Example 8.4.1

Math instructors are not only interested in how their students do on exams, on average, but how the exam scores vary. To many instructors, the standard deviation may be more important than the average.

Suppose a math instructor believes that the standard deviation for his final exam is five points. One of his best students thinks otherwise. The student claims that the standard deviation is more than five points. If the student were to conduct a hypothesis test, what would the null and alternative hypotheses be?

  • \(H_{0}: \sigma = 5\)
  • \(H_{a}: \sigma > 5\)

Exercise 8.4.2

A SCUBA instructor wants to record the collective depths each of his students dives during their checkout. He is interested in how the depths vary, even though everyone should have been at the same depth. He believes the standard deviation is three feet. His assistant thinks the standard deviation is less than three feet. Suppose the instructor finds a random sample of 25 SCUBA students and found that the sample standard deviation is 2.8 feet.

With a significance level of 5%, test the claim that the diving depths is less than 3 feet.

\(H_{0}: \sigma = 3\)

\(H_{a}: \sigma < 3\)

The word "less" tells you this is a left-tailed test.

Distribution for the test: \(\chi^{2}_{24}\), where \(n = \text{the number of customers sampled}\) \(df = n - 1 = 25 - 1 = 24\)

Calculate the test statistic (Equation \ref{test}):

\[\chi^{2} = \frac{(n-1)s^{2}}{\sigma^{2}} = \frac{(25-1)(2.8)^{2}}{3^{2}} = 20.91 \nonumber\]

where \(n = 25\), \(s = 2.8\), and \(\sigma = 3\).

A7D7D06C-938F-400F-99F0-F697F318FB16 copy.jpeg

Probability statement: \(p\text{-value} = P(\chi^{2} < 20.91) = 0.356\)

In 2nd DISTR , use 7:χ2cdf . The syntax is (lower, upper, df) for the parameter list. For Example , χ2cdf(-1E99,20.91,24) . The \(p\text{-value} = 0.356\).

Compare \(\alpha\) and the \(p\text{-value}\) :

Given \(\alpha = 0.05, p\text{-value} = 0.356 \}, \text{ then }\alpha < p\text{-value} \nonumber\)

Make a decision: Since \(\alpha < p\text{-value}\), fail to reject \(H_{0}\). This means that you are not reject \(\sigma = 3\). In other words, you do think the standard deviation of diving depth less than 3 feet.

Conclusion: At a 5% level of significance, from the data, there is not sufficient evidence to conclude that a SCUBA students' collective depth is less than 3 feet.

Example \(\{2}

  • “AppleInsider Price Guides.” Apple Insider, 2013. Available online at http://appleinsider.com/mac_price_guide (accessed May 14, 2013).
  • Data from the World Bank, June 5, 2012.

To test variability, use the chi-square test of a single variance. The test may be left-, right-, or two-tailed, and its hypotheses are always expressed in terms of the variance (or standard deviation).

Formula Review

\(\chi^{2} = \frac{(n-1) \cdot s^{2}}{\sigma^{2}}\) Test of a single variance statistic where:

\(n: \text{sample size}\)

\(s: \text{sample standard deviation}\)

\(\sigma: \text{population standard deviation}\)

\(df = n – 1 \text{Degrees of freedom}\)

Test of a Single Standard Deviation

  • Use the test to determine standard deviation.
  • The degrees of freedom is the \(\text{number of samples} - 1\).
  • The test statistic is \(\frac{(n-1) \cdot s^{2}}{\sigma^{2}}\), where \(n = \text{the total number of data}\), \(s^{2} = \text{sample variance}\), and \(\sigma^{2} = \text{population variance}\).
  • The test may be left-, right-, or two-tailed.
  • Open access
  • Published: 11 April 2024

Control of false discoveries in grouped hypothesis testing for eQTL data

  • Pratyaydipta Rudra 1 ,
  • Yi-Hui Zhou 2 ,
  • Andrew Nobel 3 &
  • Fred A. Wright 2  

BMC Bioinformatics volume  25 , Article number:  147 ( 2024 ) Cite this article

149 Accesses

Metrics details

Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches.

In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed.

Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.

Peer Review reports

Expression quantitative trait locus (eQTL) analysis aims to detect genetic loci that are associated with the expression of one or more genes [ 1 ]. For each gene, expression can be considered as a quantitative trait potentially associated with the genotypes at different sites in the genome, typically single nucleotide polymorphisms (SNPs) [ 2 ]. Although there is a substantial literature on both eQTL mapping [ 3 , 4 , 5 ] and grouped hypothesis testing [ 6 , 7 , 8 ], consideration of the natural gene-level grouping of the SNPs, e.g., SNPs local to a gene for a cis-eQTL problem, is comparatively unexplored or requires permutation methods or approximations [ 9 , 10 ]. Analysis of gene-level eQTLs and meaningful consideration of causal SNPs is an important biological problem [ 11 ]. Testing whether there is any eQTL (local SNP) for an entire gene while controlling the false discovery rate (FDR) across the set of all genes may be interesting for various reasons, which has been imperfectly addressed in the “e-Gene” concept employed by the GTEx Consortium [ 12 ].

Local ( cis ) eQTL testing includes tests of individual SNPs nearby a gene, which leads to summaries at the gene level [ 12 ]. The natural hierarchical organization would suggest standard methods for group-level testing [ 6 , 13 ]. However, local eQTL testing can include additional structure to be exploited: (i) the number of cis-eQTLs is typically large, so that explicit consideration of the proportion and “strength” of alternatives is possible; (ii) the conditional analyses of discovered eQTLs suggest that, to a first approximation, most local eQTLs can be considered unique within the region [ 14 ]; (iii) correlation of test statistics is driven by regional SNP correlation.

In the following sections, we discuss the structure of eQTL data and how the grouped nature can be effectively modeled using a random effects model. We consider the case of cis -eQTLs, i.e. local to the gene [ 14 , 15 ], where the variant affecting the gene expression is in the immediate neighborhood of the gene. Our proposed method, entitled Random Effects model and testing procedure for Group-level FDR control ( REG-FDR ), uses an empirical Bayes framework to model the eQTL data and controls the FDR by adaptive thresholding. We also propose an alternate approach Z-REG-FDR , an approximate version of REG-FDR , that uses only the summary measures given by the Z-statistics of association between genotype and expression for each gene-SNP pair. We demonstrate using simulations and real data analysis that this approximate version performs well compared to other possible approaches while having a much faster computation time.

Structure of the eQTL data and the hypotheses

eQTL data can typically be expressed in the form of an expression matrix, consisting of N genes along with a genotype matrix which has genotypes ( m SNPs) for the same n sample units. We denote the expression matrix as \(Y_{N \times n}\) and the genotype submatrix corresponding to the i th gene as \(X^{(i)}_{m_i \times n}\) , \(i=1,2,\ldots ,N\) , where \(m_i\) is the number of SNPs local to the i th gene. Linear modeling of eQTLs typically includes additional covariates, such as expression cofactors [ 12 , 16 ]. The t -statistics for the partial correlations between Y and \(X^{(i)}_{m_i \times n}\) , after both are adjusted for covariates, are equivalent to the Wald statistics for the \(X^{(i)}_{m_i \times n}\) when conducting the full linear regression in which Y is modeled as a function of \(X^{(i)}_{m_i \times n}\) and the additional covariates [ 17 , 18 ]. We assume that the sample size n is large enough that the residual degrees of freedom for the t statistic is sufficient to use a standard normal approximation. Thus we can can directly work with z -statistics for Y and \(X^{(i)}_{m_i \times n}\) , where we consider each of these matrices to have been covariate-residualized.

Let \(H_{0ij}\) denote the gene-SNP level null hypothesis that there is no eQTL at the j th SNP local to the i th gene, \(j=1,2,\ldots ,m_i, i=1,2,\ldots ,N\) . Therefore there are \(\sum _{i=1}^N{m_i}\) gene-SNP level tests. These tests can be grouped into N groups corresponding to the N genes with \(m_i\) tests in the i th group. Define \(H_{0i}\) to be the gene-level null hypothesis for the i th gene that there is no eQTL for the i th gene. Therefore the gene-level null hypothesis \(H_{0i}\) can be written as

i.e. the gene-level null requires that all of the corresponding \(m_i\) gene-SNP level hypotheses be null.

An empirical Bayes model

We adopt an empirical Bayes approach for controlling the gene-level FDR. Empirical Bayes approaches have been used in many genetic applications, and indeed these applications have been a prime motivator for the methods [ 19 , 20 ]. The advantages of using an empirical Bayes approach based on the local false discovery rate (lfdr), instead of p -value-based FDR-controlling approaches, has been discussed in [ 21 ] and [ 22 ]. The lfdr corresponding to the gene-level null hypothesis \(H_{0i}\) is

Here \(Y_i\) denotes the i th row of the matrix Y . If we obtain the lfdr \(\lambda _{i}\) for each of the gene-level hypotheses, we can control the FDR at target level \(\alpha \) for gene-level testing, using the following adaptive thresholding procedure, which has been used extensively in the literature [ 7 , 23 , 24 , 25 ].

Enumerate the index \(i_1,i_2,\ldots ,i_N\) of the genes such that \(\lambda _{i_1} \le \lambda _{i_2} \le \cdots \le \lambda _{i_N}\) .

Reject hypotheses \(H_{0i_1},\ldots ,H_{0i_L}\) where L is the largest integer such that

[ 24 ] and subsequently [ 7 ] showed that the adaptive thresholding procedure is valid in the sense that it controls the FDR at target level \(\alpha \) for an ‘oracle’ procedure where the true parameters of the model are assumed to be known. It is asymptotically valid for a ‘data-driven’ procedure when the parameters are consistently estimated from the data. [ 25 ] proved its validity under further relaxed conditions. The proof makes use of the following result (Averaging Theorem, [ 19 ]).

Let \(\text {lfdr}(z)=P(H_0|z)\) denote the lfdr for observed data z . Then, for a rejection region \({\mathcal {R}}\) , the FDR will be given by

The adaptive thresholding procedure can be used to control the FDR for testing the gene-level hypotheses \(H_{0i}\) ’s and a similar procedure can be used to test the gene-SNP level hypotheses \(H_{0ij}\) ’s. However, obtaining the gene-level lfdr’s is a non-trivial problem. In the next section, we propose a model which enables us to calculate the lfdr’s.

The random effects model and testing procedure for group-level FDR control ( REG-FDR )

Here we propose a model to obtain the gene-level lfdr values, that can be subsequently used to test the gene-level hypotheses while controlling the FDR using the adaptive thresholding method. The model is based on the following assumptions.

For any gene i , under the gene-level alternative hypothesis \(H_{0i}^c\) , there exists a single causal SNP that influences its expression.

Each of the \(m_i\) SNPs has equal probability to be the causal SNP.

First, we note that Assumption (A1) is at best a simplification, but very large eQTL studies have supported the view that most genes with eQTLs have a primary local eQTL [ 26 ], with other loci having much smaller effect sizes. We therefore treat A1 as a ‘workable condition’ [ 27 , 28 , 29 ].

Assumption (A2) could easily be relaxed, and one might use a distributional assumption over the SNPs as a modest modification of our method below (see the Discussion section). We note that it is trivial to enforce Assumption (A2) by, for example, randomizing the SNP identities within gene i prior to analysis.

Under these assumptions, the gene-level lfdr for the i th gene has the following form:

where \(\pi _0=P(H_{0i})\) is the prior probability of \(H_{0i}\) , \(f_0(Y_i)\) is the density of \(Y_i\) under the null, and \(f_1(Y_i|X_j^{(i)},\beta _{ij})\) is the conditional density under the alternative given that the j th SNP is causal. Here \(\beta _{ij}\) is correlation between the expression of the i th gene and the causal SNP j . Note that the marginal density \(p(X^{(i)})\) cancels from numerator and denominator. Importantly, this cancellation allows us to bypass the modeling of the dependence structure of the SNPs, which otherwise might have been difficult to estimate accurately.

We assume that \(f_0(.)\) is the density of the \(N_n(0,I_n)\) distribution (noting that expression data can be normalized), and that \(f_1(.|X_j^{(i)},\beta _{ij})\) is the density of the \(N_n(\beta _{ij} X_j^{(i)}, (1-\beta _{ij}^2)I_n)\) distribution, where \(\beta _{ij}\) is the correlation between \(Y_i\) and \(X_j^{(i)}\) . This choice of \(f_1\) ensures that the unconditional variance of \(Y_i\) is free of \(\beta _{ij}\) . To account for variability across genes, we assume \(\beta _{ij}\) to be a random effect such that \(\sqrt{n-3} \ tanh^{-1}(\beta _{ij})\) follows a \(N(0,\sigma ^2)\) distribution. As \(\beta _{ij}\) is a correlation coefficient, the Fisher transformation is used to ensure that the variance does not depend on the mean. Moreover, \(\sigma \) will be estimated from the data, and so the apparent dependence on n is not important to the procedure.

Our procedure treats the genotype values as fixed, and assuming the expression of genes to be independent, given genotypes, we can estimate \(\pi _0\) and \(\sigma \) using a maximum likelihood approach and follow with plug-in estimates to obtain estimates of \(\lambda _i(Y_i,X^{(i)})\) from Eq.  5 . The assumption that the expression of different genes are independent is violated in general, but our approach can be viewed as employing a composite likelihood [ 30 ], and thus consistent for \(\pi _0\) and \(\sigma \) even under independence violations [ 31 ]. An EM algorithm is used (see Additional file 1 : Section 1) for the maximum likelihood estimation. The procedure enables us to use the adaptive thresholding procedure to provide proper gene-level control of the FDR.

The Z-REG-FDR model

One computational challenge presented by the REG-FDR model is that the density \(f_1(Y_i|X_j^{(i)})\) does not have a closed form expression. While it can be expressed as the following integral

numeric maximum likelihood estimation is computationally burdensome. We propose an alternative model, termed Z-REG-FDR , which avoids this problem. In this approach, we consider the Fisher transformed and scaled z -statistics as our data. Thus, for each gene i , we have a vector of z -statistics

\(z^{(i)} = (z_{1}^{(i)},z_{2}^{(i)},\ldots ,z_{m_i}^{(i)}) , \ i=1,2,\ldots ,N,\)

where \(z_{j}^{(i)}=\sqrt{n-3} \ tanh^{-1}(r_{j}^{(i)})\) and \(r_{j}^{(i)}\) is the sample correlation of \(Y_i\) and \(X_j^{(i)}\) .

Fisher transformation and scaling ensures that \(z^{(i)}\) is approximately normal and that the variance of each component is approximately 1 under both null and alternative. Under the null, the mean of \(z^{(i)}\) is zero. We treat the component \(z^{(i)}\) as if they are independent across different genes, again relying on approximate conditional independence (given genotypes) and a compositie likelihood interpretation.

The Z-REG-FDR procedure is based on an additional assumption to (A1) and (A2) above. If the k th SNP is causal, we assume (Assumption (A3)) that the distribution of \((z_{1}^{(i)},\ldots ,z_{k-1}^{(i)},z_{k+1}^{(i)},\ldots ,z_{m_i}^{(i)})\) given \(z_{k}^{(i)}\) under the alternative is same as that under the null. In particular, we note that this assumption is true if the components of \(z^{(i)}\) have a Markov dependence structure with the same serial correlation under null and alternative, which is true in the special case that the successive marker correlations are zero. In general, this assumption can be violated, but as shown in “ Simulations: performance of Z-REG-FDR as an approximate maximum likelihood estimation ” section, the resultant procedure appears to work well in many circumstances as an approximate maximum likelihood method even when Assumption (A3) is not satisfied.

Under the above assumptions, we can write the joint distribution of the random vector \(z^{(i)}=(z_{1}^{(i)},z_{2}^{(i)},\ldots ,z_{m_i}^{(i)})\) as

under the null, and

under the alternative. We assume \(p_0(.)\) to be N (0, 1) and \(p_1(.)\) to be \(N(\mu ,1)\) , where \(\mu \) is assumed to be random with a \(N(0,\sigma ^2)\) distribution. We do not assume anything about the form of \(f_{0|k}\) except that it does not involve the parameters \(\pi _0\) and \(\sigma \) . Under these assumptions, the gene-level lfdr for this model reduces to

This follows from the cancellation of \(f_{0|k}(z_{1}^{(i)},\ldots ,z_{k-1}^{(i)},z_{k+1}^{(i)},\ldots ,z_{m_i}^{(i)})\) in the numerator and denominator. While estimating \(\pi _0\) and \(\sigma \) , a similar cancellation helps us bypass maximizing the full (approximate) likelihood

Instead, we maximize

This is equivalent to the maximum likelihood estimation under the assumption that \(f_{0|k}\) does not involve the parameters \(\pi _0\) and \(\sigma \) . Note that we need to estimate only the parameters \(\pi _0\) and \(\sigma \) to obtain the gene-level lfdr using Eq.  9 .

When the required assumptions are not satisfied, this method still has value as an approximate maximum likelihood approach. For instance, when the \(X_j^{(i)}\) ’s are related by an AR(1) structure, it can be shown that the correlation between the z -statistics depends on the effect size, i.e. the correlation between \(Y_i\) and the causal SNP, hence violating Assumption (A3). Additional file 1 : Lemma 1 and Additional file 1 : Figures 1 and 2 show the extent to which the conditional distribution \(f_{0|k}\) might depend on the effect size for any correlation structure among normally distributed SNPs. However, our results in “ Simulations: performance of Z-REG-FDR as an approximate maximum likelihood estimation ” section demonstrate that it does not have a significant adverse effect on the performance of the estimation and control of false discovery.

Simulations: performance of Z-REG-FDR when all assumptions are satisfied

First, we conducted a simulation study to explore the performance of Z-REG-FDR under the ideal situation where all assumptions are satisfied. Table  1 shows the results for simulated datasets (1000 simulations of datasets with 10,000 genes and 200 samples) where z ’s are directly simulated from an autoregressive structure, and therefore Assumption (A3) is also satisfied. The estimates are accurate to within about 15% when the true \(\sigma \) is at least 2.0. The control of the FDR is also satisfactory for \(\sigma > 2\) . However, the performance is not as good for small \(\sigma \) , which is due to the fact that it is difficult to separate the null and alternative cases when the effect sizes are small; this is true even when all the assumptions are satisfied. This is a property of the two group mixture model in the empirical Bayes set up, and not a limitation due to the approximate nature of Z-REG-FDR .

Simulations: performance of Z-REG-FDR as an approximate maximum likelihood estimation

We wished to study the accuracy of the estimation under the approximations employed and for a relatively small sample size, in order to ensure that the approach can work in this challenging situation. Accordingly, we simulated data that uses the covariate adjusted genotype matrix of a real dataset from the GTEx project (V3) [ 12 ]. The genotype matrix corresponding to the tissue ‘heart’, which had 83 samples, was selected for analysis. For computational purposes, 10,000 genes were chosen randomly from 28,991 genes. Use of genotype matrices from real data ensures that we are not enforcing Assumption (A3) while simulating, and our choice of \(f_{0|k}\) for the simulation is obtained from the data. We simulate the \(Y_i\) ’s (1,000 simulations) using the following scheme.

For each gene, decide whether it has an eQTL using a Bernoulli( \(\pi _0\) ) distribution.

If the gene has an eQTL, pick a causal SNP using a discrete uniform distribution over the \(m_i\) SNPs. Let it be the k th SNP.

If the gene has an eQTL, simulate each element of \(Y_i\) from \(N(\beta _{ij} X_k^{(i)}, 1-\beta _{ij}^2)\) with \(\sqrt{n-3} \ tanh^{-1}(\beta _{ij})\) simulated from a \(N(0,\sigma ^2)\) distribution. If the gene doesn’t have an eQTL, simulate each element of \(Y_i\) from N (0, 1).

Table  2 shows the results for this data, indicating that the estimates are still accurate and control of FDR is satisfactory unless \(\sigma \) is very small. Large eQTL studies have observed large effect sizes for cis-eQTL analysis [ 15 , 32 ] which implies that \(\sigma \) is not expected to be very small. Thus our numerical results indicate that the Z-REG-FDR method has valid applications for eQTL data.

Figure  1 shows the plot of REG-FDR estimates against the Z-REG-FDR estimates for 500 simulated datasets using the simulation scheme described above. It is clear from the plot that the two methods agree with each other (with correlations 0.906 and 0.952 for \(\pi _0\) and \(\sigma \) , respectively) and largely fall near the unit line. These results suggest that the approximate maximum likelihood method in Z-REG-FDR is quite effective in controlling the FDR, with a much improved computation speed—a few minutes on a single computer for a dataset with 10,000 genes and 100–200 samples as opposed to more than a day for REG-FDR . A comparison of the estimated lfdr and estimated FDR of the two methods is shown in Fig.  2 . It is evident that the slight over-estimation of \(\pi _0\) and the slight underestimation of \(\sigma \) by Z-REG-FDR work in opposite directions, which leads to similar lfdr values when compared to REG-FDR . The correlation between the estimated FDR based on the true values of the parameters and that based on REG-FDR or Z-REG-FDR are also very high (see Additional file 1 : Figure 3).

figure 1

Comparison of the parameter estimates using REG-FDR and Z-REG-FDR . Except a small number of cases, the two estimates agree with each other. The blue lines show the true values of the parameters

figure 2

A. Estimated lfdr and B. estimated FDR for REG-FDR and Z-REG-FDR

Behavior of the expected pseudo-log-likelihood of Z-REG-FDR

It is a standard result that the expected log-likelihood is maximized at the true value of the parameter under standard regularity conditions [ 33 ]. Since REG-FDR is the true maximum likelihood method for the proposed model, it is expected to satisfy this property. If Assumption (A3) is not satisfied then Z-REG-FDR is an approximate maximum likelihood method, and as such, its pseudo-log-likelihood need not be maximized at the true value of the parameter. We explored several realistic combinations of the true parameters and observed that the pseudo-log-likelihood of Z-REG-FDR is maximized very near the true parameter value. It is a difficult task to analytically compute the expected pseudo-log-likelihood, and so Monte-Carlo integration was used for this task. Figure  3 shows the expected pseudo-log-likelihood surface of Z-REG-FDR for \(\pi _0=0.2\) and \(\sigma =3\) . A contour plot also confirms the fact the surface peaks near the true values of the parameters.

figure 3

Demonstration of the optimization of log-likelihood properties using Z-REG-FDR method. A. Surface plot and B. Contour plot of expected pseudo-log-likelihood surface for the Z-REG-FDR method. True \(\pi _0\) and \(\sigma \) are 0.2 and 3 respectively

Simulations: comparison of Z-REG-FDR with other methods

It is possible to use other methodologies to control the FDR in grouped hypothesis testing problem for eQTL data. A conservative approach is to obtain the Bonferroni adjusted p -values for each gene, where the p -value for each gene-SNP pair is computed based on the usual t -test or z -test, and then use an FDR controlling approach [eg 34 , 35 , 36 ] to assess the conservative p -values. [ 29 ] used a permutation based (“eGene”) approach in their analysis of the GTEx data. The method uses the smallest gene-SNP p -value for a gene as the test statistic and computes its distribution by permuting the expression values. Such a distribution can be used to obtain p -values for each gene, which can subsequently be used to control the FDR by using methods such as Storey’s q-value method [ 35 ].

The Bonferroni method is typically conservative and hence less powerful. The permutation method, while correctly controlling false positives, can suffer from lack of power to detect genes having an eQTL since it uses an extreme value statistic (not based on likelihood). Our model, on the other hand, utilizes more information through its use of approximate likelihood. We carried out a simulation study to compare the performance of the methods in terms of their power. The simulations were performed using the simulation scheme described in “ Simulations: performance of Z-REG-FDR as an approximate maximum likelihood estimation ” section and statistical power was obtained using an FDR threshold of 0.05. The results are shown in Fig.  4 . As expected, the Bonferroni method turned out to have very low power and is not shown in Fig.  4 . The permutation approach with Storey’s q -value method [ 35 ] was conservative and less powerful in comparison with Z-REG-FDR . To address the possible concern that Z-REG-FDR can be slightly anti-conservative, and therefore the comparison with the permutation method is unfair, we also included an adjusted version of the Z-REG-FDR method where a slightly lower FDR threshold was chosen based on the simulations in such a way that the estimated FDR was exactly 0.05. This adjusted version had slightly less power compared to unadjusted Z-REG-FDR , but was more powerful than the permutation method.

figure 4

Power curves of different methods for varying combinations of the true parameter values

Analysis of real data

Finally, we also applied the Z-REG-FDR on a real dataset obtained from GTEx (V6) [ 12 ]. Besides Z-REG-FDR , we also used the permutation method and Simes method [ 37 ], which is expected to be more powerful than the Bonferroni method although it may not control the FDR for all types of correlation structures. We applied each method on the GTEx data for 44 tissues, separately for each tissue.

For each tissue, the normalized gene expression data and SNP genotype data were separately residualized after adjusting for covariates provided by GTEx. We fit a linear regression model with individuals’ gene expression or SNP genotype as the response variable and covariates as the explanatory variables. Then we extracted the model residuals to obtain “covariate-corrected” gene expression and SNP genotypes.

Figure  5 shows a comparison of the number of significant genes found by Z-REG-FDR and the permutation method employed by [ 12 ]. A complete list of the sample sizes and the number of significant genes discovered for the 44 tissues is provided in Additional file 1 : Table 2. The methods agree with each other to some extent in terms of number of discoveries. The Z-REG-FDR method has higher number of discoveries compared to the Permutation method and the Simes method in most cases. The parameter estimates for each tissue using Z-REG-FDR are shown in Fig.  6 .

figure 5

Comparison of Z-REG-FDR and the permutation method for GTEx data

figure 6

Parameter estimates using Z-REG-FDR for the GTEx data

We have introduced a principled procedure to perform gene-level FDR test, most appropriate and useful in the eQTL setting. A major advantage of Z-REG-FDR is its computational efficiency. While other methods such as the permutation method or our REG-FDR method can take days on a single PC to complete the analysis of a real eQTL dataset, Z-REG-FDR can do the same in a few minutes. For instance, it takes approximately two minutes to fit the model and find significant genes by Z-REG-FDR for a data set with 4.5 million SNPs grouped as local SNPs for 10, 000 genes. REG-FDR takes about a day, and the permutation method (for 10, 000 permutations) takes about 6 hours to analyze the same data. Since there are thousands of simultaneous tests, even 10, 000 permutations may not be enough to provide sufficient p -value resolution. While the Bonferroni method is very fast, it has little power to detect the genes having true eQTLs.

Z-REG-FDR has additional advantages. One important feature of the method is that it does not require access to the full data. In fact, the symmetry of the distributions involved in the Z-REG-FDR pseudo-likelihood ensure that only the gene-SNP level p -values (or equivalently the absolute z -values) are needed to fit the model. Z-REG-FDR does not model the correlation structure of the SNPs, and therefore does not require access to that data. This might be very useful since, in many genetic applications, data are found in the form of summary measures.

Z-REG-FDR can be slightly anti-conservative depending on the true values of the parameters. Various simulations show that if \(\sigma \) is large, which appears to often be the case for eQTL data, the control of FDR is satisfactory. The fact that Assumption (A3) is not satisfied does not significantly affect the FDR control. Therefore the assumption can be thought of as a means to reduce computational burden, rather than a necessary assumption for the practical workability of the model.

Assumptions (A1) and (A2) also have the potential to be relaxed, although we consider that to be beyond the scope of this paper. For example, the method can be extended by relaxing Assumption (A2) and incorporating a non-uniform prior for the causal location. If a well-grounded prior exists, then it can be incorporated into our method in a straightforward manner using weighted versions of our statistics. We have included an example in the Additional file 1 to demonstrate empirical evidence that the method remains valid even for more than one causal SNPs under certain conditions.

Our use of the lfdr statistics, while valid, does not utilize gene-level local correlation structures [ 38 , 39 , 40 , 41 ] that might provide additional power. Implementation of such methods would require sensitive estimation of gene-level correlations, and a possible direction of future effort.

With the continuous increase in the size of genomic data sets, and with the possibility of further extensions of our approach, we strongly believe that the approximate likelihood approach of the Z-REG-FDR method can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics.

Availability of data and materials

Supplementary material is available in the file Supplementary.pdf. Software in the form of R code and documentation is available at https://doi.org/10.5281/zenodo.8331734 .

Rockman MV, Kruglyak L. Genetics of global gene expression. Nat Rev Genet. 2006;7(11):862–72.

Article   CAS   PubMed   Google Scholar  

Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, et al. The genotype-tissue expression (gtex) project. Nat Genet. 2013;45(6):580–5.

Article   CAS   Google Scholar  

Palowitch J, Shabalin A, Zhou Y-H, Nobel AB, Wright FA. Estimation of cis-eqtl effect sizes using a log of linear model. Biometrics. 2018;74(2):616–25.

Nica AC, Dermitzakis ET. Expression quantitative trait loci: present and future. Philos Trans R Soc B Biol Sci. 2013;368(1620):20120362.

Article   Google Scholar  

Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat Rev Genet. 2009;10(3):184–94.

Article   CAS   PubMed   PubMed Central   Google Scholar  

James X H, Zhao H, Zhou HH. False discovery rate control with groups. J Am Stat Assoc. 2010;105(491):1215–27.

TTony Cai and Wenguang Sun. Simultaneous testing of grouped hypotheses: finding needles in multiple haystacks. J Am Stat Assoc. 2009;104(488):1467–81.

Zhao H, Zhang J. Weighted p-value procedures for controlling fdr of grouped hypotheses. J Stat Plan Inference. 2014;151:90–106.

Huang QQ, Ritchie SC, Brozynska M, Inouye M. Power, false discovery rate and winner’s curse in eqtl studies. Nucleic Acids Res. 2018;46(22):e133–e133.

Article   PubMed   PubMed Central   Google Scholar  

Sul JH, Raj T, De Jong S, De Bakker PIW, Raychaudhuri S, Ophoff RA, Stranger BE, Eskin E, Han B. Accurate and fast multiple-testing correction in eQTL studies. Am J Hum Genet 2015;96(6):857–868.

Westra H-J. From genome to function by studying eqtls. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease. 2014;1842(10):1896–902.

and GTEx Consortium. Genetic effects on gene expression across human tissues. Nature. 2017;550(7675):204.

Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. Many phenotypes without many false discoveries: error controlling strategies for multitrait association studies. Genet Epidemiol. 2016;40(1):45–56.

Article   PubMed   Google Scholar  

and GTEx Consortium. The gtex consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–30.

Wright FA, Sullivan PF, Brooks AI, Zou F, Sun W, Xia K, Madar V, Jansen R, Chung W, Zhou Y-H, et al. Heritability and genomics of gene expression in peripheral blood. Nat Genet. 2014;46(5):430–7.

Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of expression residuals (peer) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7(3):500–7.

Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28(10):1353–8.

Zhou HJ, Li L, Li Y, Li W, Li JJ. PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol. 2022;23(1):1–17.

Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol. 2002;23(1):70–86.

Ferkingstad E, Frigessi A, Rue H, Thorleifsson G, Kong A. Unsupervised empirical Bayesian multiple testing with external covariates. Ann Appl Stat. 2008;2(2):714–35.

Efron B, Storey JD, Tibshirani R. Microarrays, empirical Bayes methods, and false discovery rates. Genet. Epidemiol. Citeseer;2001.

Kendziorski CM, Newton MA, Lan H, Gould MN. On parametric empirical bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med. 2003;22(24):3899–914.

Newton MA, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5(2):155–76.

Wenguang Sun and T Tony Cai. Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc. 2007;102(479):901–12.

Li G, Shabalin AA, Rusyn I, Wright FA, Nobel AB. An empirical bayes approach for multiple tissue eQTL analysis. Biostatistics. 2018;19(3):391–406.

Jansen R, Hottenga J-J, Nivard MG, Abdellaoui A, Laport B, de Geus EJ, Wright FA, Penninx BWJH, Boomsma DI. Conditional eQTL analysis reveals allelic heterogeneity of gene expression. Hum Mol Genet. 2017;26(8):1444–51.

Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD. Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics. 2006;62(1):19–27.

Gelfond JAL, Ibrahim JG, Zou F. Proximity model for expression quantitative trait loci (eQTL) detection. Biometrics. 2007;63(4):1108–16.

Ardlie KG, Deluca DS, Segrè AV, Sullivan TJ, Young TR, Gelfand ET, Trowbridge CA, Maller JB, Tukiainen T, Lek M, et al. The genotype-tissue expression (gtex) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60.

Varin C, Reid N, Firth D. An overview of composite likelihood methods. Stat Sin. 2011;21(1):5–42.

Google Scholar  

Ximing X, Reid N. On the robustness of maximum composite likelihood estimate. J Stat Plan Inference. 2011;141(9):3047–54.

Joehanes R, Zhang X, Huan T, Yao C, Ying S, Nguyen QT, Demirkale CY, Feolo ML, Sharopova NR, Sturcke A, et al. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 2017;18(1):1–24.

Cox DR, Hinkley DV. Theoretical statistics. CRC Press;1979.

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300.

Storey JD. A direct approach to false discovery rates. J R Stat Soc Ser B (Stat Methodol). 2002;64(3):479–98.

Strimmer K. A unified approach to false discovery rate estimation. BMC Bioinf. 2008;9(1):303.

John Simes R. An improved bonferroni procedure for multiple tests of significance. Biometrika. 1986;73(3):751–4.

Wei Z, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics. 2007;23(12):1537–44.

Sun W, Tony Cai T. Large-scale multiple testing under dependence. J R Stat Soc Ser B (Stat Methodol). 2009;71(2):393–424.

Wei Z, Sun W, Wang K, Hakonarson H. Multiple testing in genome-wide association studies via hidden Markov models. Bioinformatics. 2009;25(21):2802–8.

Xiao J, Zhu W, Guo J. Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models. BMC Bioinf. 2013;14:1–12.

Download references

Acknowledgements

Not applicable.

Supported in part by R01ES033243 and R01ES029911.

Author information

Authors and affiliations.

Department of Statistics, Oklahoma State University, Stillwater, OK, USA

Pratyaydipta Rudra

Bioinformatics Research Center, Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, USA

Yi-Hui Zhou & Fred A. Wright

Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA

Andrew Nobel

You can also search for this author in PubMed   Google Scholar

Contributions

PR constructed the models and performed the statistical analyses of simulated and real data. YZ conducted pre-processing and covariate adjustment for the real data. FAW and AN supervised the modeling and analysis. All authors have read and approved the final version of this manuscript.

Corresponding authors

Correspondence to Pratyaydipta Rudra or Fred A. Wright .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

Supplementary Materials.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Rudra, P., Zhou, YH., Nobel, A. et al. Control of false discoveries in grouped hypothesis testing for eQTL data. BMC Bioinformatics 25 , 147 (2024). https://doi.org/10.1186/s12859-024-05736-3

Download citation

Received : 18 August 2023

Accepted : 08 March 2024

Published : 11 April 2024

DOI : https://doi.org/10.1186/s12859-024-05736-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Grouped hypothesis testing
  • False discovery rate
  • Empirical Bayes

BMC Bioinformatics

ISSN: 1471-2105

hypothesis test using standard error

IMAGES

  1. Hypothesis Testing Solved Examples(Questions and Solutions)

    hypothesis test using standard error

  2. Significance Level and Power of a Hypothesis Test Tutorial

    hypothesis test using standard error

  3. PPT

    hypothesis test using standard error

  4. How To Calculate Standard Deviation In Hypothesis Testing

    hypothesis test using standard error

  5. How to Use Standard Error of the Mean to Reject/Accept a Hypothesis

    hypothesis test using standard error

  6. Hypothesis Testing- Meaning, Types & Steps

    hypothesis test using standard error

VIDEO

  1. Testing of hypothesis, types of error, steps for testing of hypothesis

  2. Testing of hypothesis -single Mean Problems| Statistical Inference| MAT202 |MAT208 |Module 3| Part 8

  3. Testing of hypothesis -single Mean Problems| Statistical Inference| MAT202 |MAT208 |Module 3| Part 7

  4. Hypothesis Testing Made Easy: These are the Steps

  5. Hypothesis Test Sample Mean

  6. Hypothsis Testing in Statistics Part 2 Steps to Solving a Problem

COMMENTS

  1. What Is Standard Error?

    Using descriptive and inferential statistics, you can make two types of estimates about the population: point estimates and interval estimates.. A point estimate is a single value estimate of a parameter.For instance, a sample mean is a point estimate of a population mean. An interval estimate gives you a range of values where the parameter is expected to lie.

  2. 7.4.1

    Here, we'll be using the formula below for the general form of the test statistic. Determine the p-value. The p-value is the area under the standard normal distribution that is more extreme than the test statistic in the direction of the alternative hypothesis. Make a decision. If \(p \leq \alpha\) reject the null hypothesis.

  3. 4.4: Hypothesis Testing

    The hypothesis test will be evaluated using a significance level of \(\alpha = 0.05\). We want to consider the data under the scenario that the null hypothesis is true. In this case, the sample mean is from a distribution that is nearly normal and has mean 7 and standard deviation of about 0.17.

  4. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  5. Types I & Type II Errors in Hypothesis Testing

    Ideally, a hypothesis test fails to reject the null hypothesis when the effect is not present in the population, and it rejects the null hypothesis when the effect exists. Statisticians define two types of errors in hypothesis testing. Creatively, they call these errors Type I and Type II errors.

  6. Standard Error of the Mean (SEM)

    Related post: Descriptive versus Inferential Statistics. SEM and the Precision of Sample Estimates. Because SEMs assess how far your sample mean is likely to fall from the population mean, it evaluates how closely your sample estimates the population, which statisticians refer to as precision.

  7. How To Find The Standard Error: Formula & Calculation

    Saul Mcleod, PhD. Editor-in-Chief for Simply Psychology . BSc (Hons) Psychology, MRes, PhD, University of Manchester. Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education.

  8. 9.2: Hypothesis Testing

    Outcomes and the Type I and Type II Errors. When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis \(H_{0}\) and the decision to reject or not. ... You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample ...

  9. 6a.2

    Below these are summarized into six such steps to conducting a test of a hypothesis. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as H 0, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is ...

  10. Hypothesis Testing

    4.2 Hypothesis Testing with One Sample. A useful way to think about our research question for the purpose of hypothesis testing is to turn it into two hypotheses: a null hypothesis ( \ (H_0\)) and an alternative hypothesis ( \ (H_A\)) as follows: \ (H_0\): The mean GPA of study abroad returnees is 3.0.

  11. 8.2: Hypothesis Testing with t

    Figure 8.2.2 8.2. 2: Rejection Region. Step 3: Compute the Test Statistic The four wait times you experienced for your oil changes are the new shop were 46 minutes, 58 minutes, 40 minutes, and 71 minutes. We will use these to calculate X¯¯¯¯ X ¯ and s by first filling in the sum of squares table in Table 8.2.1 8.2. 1:

  12. Standard Error in Statistics

    Let's say, you collected data from approx ~5 trees per sample from different places and the numbers are shown below. # Annual yield of coconut sample1 = [400, 420, 470, 510, 590] sample2 = [430, 500, 570, 620, 710, 800, 900] sample3 = [360, 410, 490, 550, 640] In above data, the variables sample1, sample2 and sample3 contain the samples of annual yield values collected, where each number ...

  13. Standard Error

    What is Standard Error? ... It is especially useful in the field of econometrics, where researchers use it in performing regression analyses and hypothesis testing. It is also used in inferential statistics, where it forms the basis for the construction of the confidence intervals. ... For example, consider the marks of 50 students in a class ...

  14. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  15. Standard Error in Hypothesis Testing

    Related Posts: Descriptive research and it's methods; The Literature Review in Research; Secondary Data Sources for Research; Sampling Methods in Research

  16. S.3.2 Hypothesis Testing (P-Value Approach)

    The P -value is, therefore, the area under a tn - 1 = t14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually. Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests.

  17. 8.3: Hypothesis Test Examples for Means with Unknown Standard Deviation

    The data are assumed to be from a normal distribution. Answer. Set up the hypothesis test: A 5% level of significance means that α = 0.05. This is a test of a single population mean. H0: μ = 65 Ha: μ > 65. Since the instructor thinks the average score is higher, use a " > ". The " > " means the test is right-tailed.

  18. 8.6: Hypothesis Test of a Single Population Mean with Examples

    Full Hypothesis Test Examples. Example 8.6.4. Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71.

  19. Test Statistic: Definition, Types & Formulas

    A test statistic assesses how consistent your sample data are with the null hypothesis in a hypothesis test. Test statistic calculations take your sample data and boil them down to a single number that quantifies how much your sample diverges from the null hypothesis. As a test statistic value becomes more extreme, it indicates larger ...

  20. Understanding Hypothesis Testing

    Step 3: Compute the test statistic. The test statistic is calculated by using the z formula Z= and we get accordingly , Z=2.039999999999992. Step 4: Result. Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis.

  21. 8.4: Hypothesis Test on a Single Standard Deviation

    A test of a single standard deviation assumes that the underlying distribution is normal. The null and alternative hypotheses are stated in terms of the population standard deviation (or population variance). The test statistic is: χ2 = (n − 1)s2 σ2 (8.4.1) (8.4.1) χ 2 = ( n − 1) s 2 σ 2. where:

  22. How to Find the P value: Process and Calculations

    To find the p value for your sample, do the following: Identify the correct test statistic. Calculate the test statistic using the relevant properties of your sample. Specify the characteristics of the test statistic's sampling distribution. Place your test statistic in the sampling distribution to find the p value.

  23. Control of false discoveries in grouped hypothesis testing for eQTL

    Structure of the eQTL data and the hypotheses. eQTL data can typically be expressed in the form of an expression matrix, consisting of N genes along with a genotype matrix which has genotypes (m SNPs) for the same n sample units. We denote the expression matrix as \(Y_{N \times n}\) and the genotype submatrix corresponding to the ith gene as \(X^{(i)}_{m_i \times n}\), \(i=1,2,\ldots ,N ...

  24. Z Test: Uses, Formula & Examples

    Related posts: Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels. Two-Sample Z Test Hypotheses. Null hypothesis (H 0): Two population means are equal (µ 1 = µ 2).; Alternative hypothesis (H A): Two population means are not equal (µ 1 ≠ µ 2).; Again, when the p-value is less than or equal to your significance level, reject the null hypothesis.