13.2 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r , tells us about the strength and direction of the linear relationship between X 1 and X 2 .

The sample data are used to compute r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

  • Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between X 1 and X 2 because the correlation coefficient is significantly different from zero.
  • What the conclusion means: There is a significant linear relationship X 1 and X 2 . If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0
  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between X 1 and X 2 in the population.
  • Alternate Hypothesis H a : The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population.

Drawing a Conclusion There are two methods of making the decision concerning the hypothesis. The test statistic to test this hypothesis is:

Where the second formula is an equivalent form of the test statistic, n is the sample size and the degrees of freedom are n-2. This is a t-statistic and operates in the same way as other t tests. Calculate the t-value and compare that with the critical value from the t-table at the appropriate degrees of freedom and the level of confidence you wish to maintain. If the calculated value is in the tail then cannot accept the null hypothesis that there is no linear relationship between these two independent random variables. If the calculated t-value is NOT in the tailed then cannot reject the null hypothesis that there is no linear relationship between the two variables.

A quick shorthand way to test correlations is the relationship between the sample size and the correlation. If:

then this implies that the correlation between the two variables demonstrates that a linear relationship exists and is statistically significant at approximately the 0.05 level of significance. As the formula indicates, there is an inverse relationship between the sample size and the required correlation for significance of a linear relationship. With only 10 observations, the required correlation for significance is 0.6325, for 30 observations the required correlation for significance decreases to 0.3651 and at 100 observations the required level is only 0.2000.

Correlations may be helpful in visualizing the data, but are not appropriately used to "explain" a relationship between two variables. Perhaps no single statistic is more misused than the correlation coefficient. Citing correlations between health conditions and everything from place of residence to eye color have the effect of implying a cause and effect relationship. This simply cannot be accomplished with a correlation coefficient. The correlation coefficient is, of course, innocent of this misinterpretation. It is the duty of the analyst to use a statistic that is designed to test for cause and effect relationships and report only those results if they are intending to make such a claim. The problem is that passing this more rigorous test is difficult so lazy and/or unscrupulous "researchers" fall back on correlations when they cannot make their case legitimately.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics/pages/1-introduction
  • Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Business Statistics
  • Publication date: Nov 29, 2017
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-business-statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-business-statistics/pages/13-2-testing-the-significance-of-the-correlation-coefficient

© Jun 23, 2022 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Module 12: Linear Regression and Correlation

Testing the significance of the correlation coefficient, learning outcomes.

  • Calculate and interpret the correlation coefficient

The correlation coefficient,  r , tells us about the strength and direction of the linear relationship between x and y . However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n , together.

We perform a hypothesis test of the “ significance of the correlation coefficient ” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute  r , the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r , is our estimate of the unknown population correlation coefficient.

  • The symbol for the population correlation coefficient is ρ , the Greek letter “rho.”
  • ρ = population correlation coefficient (unknown)
  • r = sample correlation coefficient (known; calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is “close to zero” or “significantly different from zero”. We decide this based on the sample correlation coefficient r and the sample size n .

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.” Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero. What the conclusion means: There is a significant linear relationship between x and y . We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is “not significant.”

Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.” What the conclusion means: There is not a significant linear relationship between x and y . Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.

  • If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
  • If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
  • If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

Performing the Hypothesis Test

  • Null Hypothesis: H 0 : ρ = 0
  • Alternate Hypothesis: H a : ρ ≠ 0

What the Hypotheses Mean in Words

  • Null Hypothesis H 0 : The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between x and y in the population.
  • Alternate Hypothesis H a : The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are equivalent and give the same result.

  • Method 1: Using the p -value
  • Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%,  α = 0.05

Using the  p -value method, you could choose any appropriate significance level you want; you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

Method 1: Using a p -value to make a decision

To calculate the  p -value using LinRegTTEST:

  • On the LinRegTTEST input screen, on the line prompt for β or ρ , highlight “≠ 0”
  • The output screen shows the p-value on the line that reads “p =”.
  • (Most computer statistical software can calculate the p -value.)

If the p -value is less than the significance level ( α = 0.05)

  • Decision: Reject the null hypothesis.
  • Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”

If the p -value is NOT less than the significance level ( α = 0.05)

  • Decision: DO NOT REJECT the null hypothesis.
  • Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

  • You will use technology to calculate the p -value. The following describes the calculations to compute the test statistics and the p -value:
  • The p -value is calculated using a t -distribution with n – 2 degrees of freedom.
  • The formula for the test statistic is [latex]\displaystyle{t}=\frac{{{r}\sqrt{{{n}-{2}}}}}{\sqrt{{{1}-{r}^{{2}}}}}[/latex]. The value of the test statistic, t , is shown in the computer or calculator output along with the p -value. The test statistic t has the same sign as the correlation coefficient r .
  • The p -value is the combined area in both tails.

An alternative way to calculate the  p -value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

Method 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of is significant or not. Compare  r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Suppose you computed  r = 0.801 using n = 10 data points. df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is  significant . Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.

For a given line of best fit, you computed that  r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for prediction, because  r > the positive critical value.

Suppose you computed  r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

Horizontal number line with values of -0.624, -0.532, and 0.532.

For a given line of best fit, you compute that  r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction, because  r < the positive critical value.

Suppose you computed  r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

Horizontal number line with values -0.924, -0.532, and 0.532.

–0.811 <  r = 0.776 < 0.811. Therefore, r is not significant.

For a given line of best fit, you compute that  r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

Yes, the line can be used for prediction, because  r < the negative critical value.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if  r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

  • r = –0.567 and the sample size, n , is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
  • r = 0.708 and the sample size, n , is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
  • r = 0.134 and the sample size, n , is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
  • r = 0 and the sample size, n , is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that  r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample size is.

Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

The assumptions underlying the test of significance are:

  • There is a linear relationship in the population that models the average value of y for varying values of x . In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
  • The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
  • The standard deviations of the population y values about the line are equal for each value of x . In other words, each of these normal distributions of y values has the same shape and spread about the line.
  • The residual errors are mutually independent (no pattern).
  • The data are produced from a well-designed, random sample or randomized experiment.

The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line — they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.

The  y values for each x value are normally distributed about the line with the same standard deviation. For each x value, the mean of the y values lies on the regression line. More y values lie near the line than are scattered further away from the line.

Concept Review

Linear regression is a procedure for fitting a straight line of the form [latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex] to data. The conditions for regression are:

  • Linear: In the population, there is a linear relationship that models the average value of y for different values of x .
  • Independent: The residuals are assumed to be independent.
  • Normal: The y values are distributed normally for any value of x .
  • Equal variance: The standard deviation of the y values is equal for each x value.
  • Random: The data are produced from a well-designed random sample or randomized experiment.

The slope  b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. To estimate the population standard deviation of y , σ , use the standard deviation of the residuals, s .

[latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex] The variable ρ (rho) is the population correlation coefficient.

To test the null hypothesis  H 0 : ρ = hypothesized value , use a linear regression t-test. The most common null hypothesis is H 0 : ρ = 0 which indicates there is no linear relationship between x and y in the population.

The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

Formula Review

Least Squares Line or Line of Best Fit: [latex]\displaystyle\hat{{y}}={a}+{b}{x}[/latex]

where  a = y -intercept,  b = slope

Standard deviation of the residuals:

[latex]\displaystyle{s}=\sqrt{{\frac{{{S}{S}{E}}}{{{n}-{2}}}}}[/latex]

SSE = sum of squared errors

n = the number of data points

  • OpenStax, Statistics, Testing the Significance of the Correlation Coefficient. Provided by : OpenStax. Located at : http://cnx.org/contents/[email protected]:83/Introductory_Statistics . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.3 - testing for partial correlation.

When discussing ordinary correlations we looked at tests for the null hypothesis that the ordinary correlation is equal to zero, against the alternative that it is not equal to zero. If that null hypothesis is rejected, then we look at confidence intervals for the ordinary correlation. Similar objectives can be considered for the partial correlation.

First, consider testing the null hypothesis that a partial correlation is equal to zero against the alternative that it is not equal to zero. This is expressed below:

\(H_0\colon \rho_{jk\textbf{.x}}=0\) against \(H_a\colon \rho_{jk\textbf{.x}}\ne 0\)

Here we will use a test statistic that is similar to the one we used for an ordinary correlation. This test statistic is shown below:

\(t = r_{jk\textbf{.x}}\sqrt{\frac{n-2-c}{1-r^2_{jk\textbf{.x}}}}\)      \(\dot{\sim}\)  \(t_{n-2-c}\)

The only difference between this and the previous one is what appears in the numerator of the radical. Before we just took n - 2. Here we take n - 2 - c , where c is the number of variables upon which we are conditioning. In our Adult Intelligence data, we conditioned on two variables so c would be equal to 2 in this case.

Under the null hypothesis, this test statistic will be approximately t -distributed, also with n - 2 - c degrees of freedom.

We would reject \(H_{o}\colon\) if the absolute value of the test statistic exceeded the critical value from the t -table evaluated at \(\alpha\) over 2:

\(|t| > t_{n-2-c, \alpha/2}\)

Example 6-3: Wechsler Adult Intelligence Data Section  

For the Wechsler Adult Intelligence Data, we found a partial correlation of 0.711879, which we enter into the expression for the test statistic as shown below:

\(t = 0.711879 \sqrt{\dfrac{37-2-2}{1-0.711879^2}}=5.82\)

The sample size is 37, along with the 2 variables upon which we are conditioning is also substituted in. Carry out the math and we get a test statistic of 5.82 as shown above.

Here we want to compare this value to a t -distribution with 33 degrees of freedom for an \(\alpha\) = 0.01 level test. Therefore, we are going to look at the critical value for 0.005 in the table (because 33 does not appear to use the closest df that does not exceed 33 which is 30).  In this case it is 2.75, meaning that \(t _ { ( d f , 1 - \alpha / 2 ) } = t _ { ( 33,0.995 ) } \) is 2.75.

Because \(5.82 > 2.75 = t _ { ( 33,0.995 ) }\), we can reject the null hypothesis, \(H_{o}\) at the \(\alpha = 0.01\) level and conclude that there is a significant partial correlation between these two variables. In particular, we would include that this partial correlation is positive indicating that even after taking into account Arithmetic and Picture Completion, there is a positive association between Information and Similarities.

Confidence Interval for the partial correlation, \(\rho_{jk\textbf{.x}}\) Section  

The procedure here is very similar to the procedure we used for ordinary correlation.

Compute Fisher's transformation of the partial correlation using the same formula as before.

\(z_{jk} = \dfrac{1}{2}\log \left( \dfrac{1+r_{jk\textbf{.X}}}{1-r_{jk\textbf{.X}}}\right) \)

In this case, for a large n , this Fisher transform variable will be possibly normally distributed. The mean is equal to the Fisher transform for the population value for this partial correlation, and the variance is equal to 1 over n-3-c .

\(z_{jk}\)  \(\dot{\sim}\)  \(N \left( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}, \dfrac{1}{n-3-c}\right)\)

Compute a \((1 - \alpha) × 100\%\) confidence interval for the Fisher transform correlation. This expression is shown below:

\( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}\)

This yields the bounds \(Z_{l}\) and  \(Z_{u}\)  as before.

\(\left(\underset{Z_l}{\underbrace{Z_{jk}-\dfrac{Z_{\alpha/2}}{\sqrt{n-3-c}}}}, \underset{Z_U}{\underbrace{Z_{jk}+\dfrac{Z_{\alpha/2}}{\sqrt{n-3-c}}}}\right)\)

Back transform to obtain the desired confidence interval for the partial correlation - \(\rho_{jk\textbf{.X}}\)

\(\left(\dfrac{e^{2Z_l}-1}{e^{2Z_l}+1}, \dfrac{e^{2Z_U}-1}{e^{2Z_U}+1}\right)\)

Example 6-3: Wechsler Adult Intelligence Data (Steps Shown) Section  

The confidence interval is calculated by substituting the results from the Wechsler Adult Intelligence Data into the appropriate steps below:

Step 1 : Compute the Fisher transform:

\begin{align} Z_{12} &= \dfrac{1}{2}\log \frac{1+r_{12.34}}{1-r_{12.34}}\\[5pt] &= \dfrac{1}{2} \log \frac{1+0.711879}{1-0.711879}\\[5pt] &= 0.89098 \end{align}

Step 2 : Compute the 95% confidence interval for \( \frac{1}{2}\log \frac{1+\rho_{12.34}}{1-\rho_{12.34}}\) :

\begin{align} Z_l &= Z_{12}-Z_{0.025}/\sqrt{n-3-c}\\[5pt] & = 0.89098 - \dfrac{1.96}{\sqrt{37-3-2}}\\[5pt] &= 0.5445 \end{align}

\begin{align} Z_U &= Z_{12}+Z_{0.025}/\sqrt{n-3-c}\\[5pt] &= 0.89098 + \dfrac{1.96}{\sqrt{37-3-2}} \\[5pt] &= 1.2375 \end{align}

Step 3 : Back-transform to obtain the 95% confidence interval for \(\rho_{12.34}\) :

\(\left(\dfrac{\exp\{2Z_l\}-1}{\exp\{2Z_l\}+1}, \dfrac{\exp\{2Z_U\}-1}{\exp\{2Z_U\}+1}\right)\)

\(\left(\dfrac{\exp\{2\times 0.5445\}-1}{\exp\{2\times 0.5445\}+1}, \dfrac{\exp\{2\times 1.2375\}-1}{\exp\{2\times 1.2375\}+1}\right)\)

\((0.4964, 0.8447)\)

Based on this result, we can conclude that we are 95% confident that the interval (0.4964, 0.8447) contains the partial correlation between Information and Similarities scores given scores on Arithmetic and Picture Completion.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.2.1: Hypothesis Test for Linear Regression

  • Last updated
  • Save as PDF
  • Page ID 34850

  • Rachel Webb
  • Portland State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

To test to see if the slope is significant we will be doing a two-tailed test with hypotheses. The population least squares regression line would be \(y = \beta_{0} + \beta_{1} + \varepsilon\) where \(\beta_{0}\) (pronounced “beta-naught”) is the population \(y\)-intercept, \(\beta_{1}\) (pronounced “beta-one”) is the population slope and \(\varepsilon\) is called the error term.

If the slope were horizontal (equal to zero), the regression line would give the same \(y\)-value for every input of \(x\) and would be of no use. If there is a statistically significant linear relationship then the slope needs to be different from zero. We will only do the two-tailed test, but the same rules for hypothesis testing apply for a one-tailed test.

We will only be using the two-tailed test for a population slope.

The hypotheses are:

\(H_{0}: \beta_{1} = 0\) \(H_{1}: \beta_{1} \neq 0\)

The null hypothesis of a two-tailed test states that there is not a linear relationship between \(x\) and \(y\). The alternative hypothesis of a two-tailed test states that there is a significant linear relationship between \(x\) and \(y\).

Either a t-test or an F-test may be used to see if the slope is significantly different from zero. The population of the variable \(y\) must be normally distributed.

F-Test for Regression

An F-test can be used instead of a t-test. Both tests will yield the same results, so it is a matter of preference and what technology is available. Figure 12-12 is a template for a regression ANOVA table,

Template for a regression table, containing equations for the sum of squares, degrees of freedom and mean square for regression and for error, as well as the F value of the data.

where \(n\) is the number of pairs in the sample and \(p\) is the number of predictor (independent) variables; for now this is just \(p = 1\). Use the F-distribution with degrees of freedom for regression = \(df_{R} = p\), and degrees of freedom for error = \(df_{E} = n - p - 1\). This F-test is always a right-tailed test since ANOVA is testing the variation in the regression model is larger than the variation in the error.

Use an F-test to see if there is a significant relationship between hours studied and grade on the exam. Use \(\alpha\) = 0.05.

T-Test for Regression

If the regression equation has a slope of zero, then every \(x\) value will give the same \(y\) value and the regression equation would be useless for prediction. We should perform a t-test to see if the slope is significantly different from zero before using the regression equation for prediction. The numeric value of t will be the same as the t-test for a correlation. The two test statistic formulas are algebraically equal; however, the formulas are different and we use a different parameter in the hypotheses.

The formula for the t-test statistic is \(t = \frac{b_{1}}{\sqrt{ \left(\frac{MSE}{SS_{xx}}\right) }}\)

Use the t-distribution with degrees of freedom equal to \(n - p - 1\).

The t-test for slope has the same hypotheses as the F-test:

Use a t-test to see if there is a significant relationship between hours studied and grade on the exam, use \(\alpha\) = 0.05.

IMAGES

  1. Hypothesis Testing for Zero Correlation

    hypothesis testing zero correlation

  2. Hypothesis Testing for Zero Correlation

    hypothesis testing zero correlation

  3. HYPOTHESIS TESTING for ZERO CORRELATION 1C

    hypothesis testing zero correlation

  4. Hypothesis Testing for Zero Correlation

    hypothesis testing zero correlation

  5. The Learning Vault

    hypothesis testing zero correlation

  6. Null Hypothesis

    hypothesis testing zero correlation

VIDEO

  1. Correlation hypothesis testing part 1

  2. Lecture 12 4 Hypothesis Testing Significance of Correlation

  3. Correlation Hypothesis Test Theory

  4. Lec36/Hypothesis Testing/t Test for Correlation Coefficient

  5. 1.3 Hypothesis Testing For Zero Correlation (STATISTICS AND MECHANICS 2

  6. Correlation (Parametric and Non-parametric) with hypothesis in SPSS in Bangla

COMMENTS

  1. 11.2: Correlation Hypothesis Test

    The hypothesis test lets us decide whether the value of the population correlation coefficient \(\rho\) is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient \(r\) and the sample size \(n\).

  2. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r ( n − 2 1 −r2)− −−−−−−−√ t = r ( n − 2 1 − r 2).

  3. 5.3

    5.3 - Inferences for Correlations. Let us consider testing the null hypothesis that there is zero correlation between two variables X j and X k. Mathematically we write this as shown below: H 0: ρ j k = 0 against H a: ρ j k ≠ 0. Recall that the correlation is estimated by sample correlation r j k given in the expression below: r j k = s j k ...

  4. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  5. 12.4 Testing the Significance of the Correlation Coefficient

    PERFORMING THE HYPOTHESIS TEST. Null Hypothesis: H 0: ρ = 0 Alternate Hypothesis: H a: ρ ≠ 0 WHAT THE HYPOTHESES MEAN IN WORDS: Null Hypothesis H 0: The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.; Alternate Hypothesis H a: The population correlation coefficient ...

  6. 9.4.1

    Next, we need to find the p-value. The p-value for the two-sided test is: \ (\text {p-value}=2P (T>5.1556)<0.0001\) Therefore, for any reasonable \ (\alpha\) level, we can reject the hypothesis that the population correlation coefficient is 0 and conclude that it is nonzero. There is evidence at the 5% level that Height and Weight are linearly ...

  7. HYPOTHESIS TESTING for ZERO CORRELATION 1C

    Chapter 1 Regression, Correlation and Hypothesis Testing, section 1.3 (1C): Hypothesis Testing for Zero Correlation00:00 Theory and full definitions. Clear a...

  8. PDF Lecture 2: Hypothesis testing and correlation

    of testing differences in means (or medians) of two groups, but hypothesis testing can be applied in other circumstances. One case of note is testing whether the mean (or median) of a single group is different from zero. To address this case, let's pose the null hypothesis that the observed data come from a probability distribution that has ...

  9. 13.2 Testing the Significance of the Correlation Coefficient

    Alternate Hypothesis H a: The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X 1 and X 2 in the population. Drawing a Conclusion There are two methods of making the decision concerning the hypothesis.

  10. Hypothesis Testing for Zero Correlation

    In this video I show you how to do hypothesis testing for zero correlation from a population by taking a sample of observations from the population calculati...

  11. Zero Correlation

    tests the hypothesis of a zero correlation using the heteroscedastic bootstrap method just described. By default, it uses the percentage bend correlation, but any correlation can be specified by the argument corfun. For example, the command corb (x,y,corfun=wincor,tr= 0.25) will use a 25% Winsorized correlation.

  12. Interpreting Correlation Coefficients

    The p-value is for a hypothesis test that determines whether your correlation value is significantly different from zero (no correlation). If we take your -0.002 correlation and it's p-value (0.995), we'd interpret that as meaning that your sample contains insufficient evidence to conclude that the population correlation is not zero.

  13. Pearson Correlation Coefficient (r)

    The degrees of freedom (df): For Pearson correlation tests, the formula is df = n - 2. Significance level ... (a zero before the decimal point) since the Pearson correlation coefficient can't be greater than one or less than negative one. ... Hypothesis testing is a formal procedure for investigating our ideas about the world. It allows you ...

  14. Hypothesis Test for Correlation

    The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero.". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we ...

  15. Correlation Coefficient

    Correlation analysis example You check whether the data meet all of the assumptions for the Pearson's r correlation test. Both variables are quantitative and normally distributed with no outliers, so you calculate a Pearson's r correlation coefficient. The correlation coefficient is strong at .58. Interpreting a correlation coefficient

  16. Testing the Significance of the Correlation Coefficient

    The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y.However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.. We perform a hypothesis test of the "significance of the ...

  17. 6.3

    First, consider testing the null hypothesis that a partial correlation is equal to zero against the alternative that it is not equal to zero. This is expressed below: H 0: ρ j k .x = 0 against H a: ρ j k .x ≠ 0. Here we will use a test statistic that is similar to the one we used for an ordinary correlation. This test statistic is shown below:

  18. 2.5.2 Hypothesis Testing for Correlation

    You should be familiar with using a hypothesis test to determine bias within probability problems. It is also possible to use a hypothesis test to determine whether a given product moment correlation coefficient calculated from a sample could be representative of the same relationship existing within the whole population. For full information on hypothesis testing, see the revision notes from ...

  19. Edexcel A Level Maths : 1.3 Hypothesis Testing for Zero Correlation

    https://www.buymeacoffee.com/zeeshanzamurredPearson A level maths, applied year 2 textbook (1.3) In this video I cover: 1. Sample product moment correlation ...

  20. 12.5: Testing the Significance of the Correlation Coefficient

    The formula for the test statistic is t = r n−2√ 1−r2√ t = r n − 2 1 − r 2. The value of the test statistic, t t, is shown in the computer or calculator output along with the p-value p -value. The test statistic t t has the same sign as the correlation coefficient r r.

  21. 2.5.2 Hypothesis Testing for Correlation

    You should be familiar with using a hypothesis test to determine bias within probability problems. It is also possible to use a hypothesis test to determine whether a given product moment correlation coefficient calculated from a sample could be representative of the same relationship existing within the whole population. For full information on hypothesis testing, see the revision notes from ...

  22. hypothesis testing

    Stack Exchange Network. Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.. Visit Stack Exchange

  23. 10.1: Testing the Significance of the Correlation Coefficient

    The formula for the test statistic is t = r√n − 2 √1 − r2. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r. The p-value is the combined area in both tails.

  24. 12.2.1: Hypothesis Test for Linear Regression

    The test statistic value is the same value of the t-test for correlation even though they used different formulas. We look in the same place using technology as the correlation test. The test statistic is greater than the critical value of 2.160 and in the rejection region. The decision is to reject \(H_{0}\).