Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

introduction to hypothesis testing assignment

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 12, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Logo for University of Missouri System

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7 Chapter 7: Introduction to Hypothesis Testing

alternative hypothesis

critical value

effect size

null hypothesis

probability value

rejection region

significance level

statistical power

statistical significance

test statistic

Type I error

Type II error

This chapter lays out the basic logic and process of hypothesis testing. We will perform z  tests, which use the z  score formula from Chapter 6 and data from a sample mean to make an inference about a population.

Logic and Purpose of Hypothesis Testing

A hypothesis is a prediction that is tested in a research study. The statistician R. A. Fisher explained the concept of hypothesis testing with a story of a lady tasting tea. Here we will present an example based on James Bond who insisted that martinis should be shaken rather than stirred. Let’s consider a hypothetical experiment to determine whether Mr. Bond can tell the difference between a shaken martini and a stirred martini. Suppose we gave Mr. Bond a series of 16 taste tests. In each test, we flipped a fair coin to determine whether to stir or shake the martini. Then we presented the martini to Mr. Bond and asked him to decide whether it was shaken or stirred. Let’s say Mr. Bond was correct on 13 of the 16 taste tests. Does this prove that Mr. Bond has at least some ability to tell whether the martini was shaken or stirred?

This result does not prove that he does; it could be he was just lucky and guessed right 13 out of 16 times. But how plausible is the explanation that he was just lucky? To assess its plausibility, we determine the probability that someone who was just guessing would be correct 13/16 times or more. This probability can be computed to be .0106. This is a pretty low probability, and therefore someone would have to be very lucky to be correct 13 or more times out of 16 if they were just guessing. So either Mr. Bond was very lucky, or he can tell whether the drink was shaken or stirred. The hypothesis that he was guessing is not proven false, but considerable doubt is cast on it. Therefore, there is strong evidence that Mr. Bond can tell whether a drink was shaken or stirred.

Let’s consider another example. The case study Physicians’ Reactions sought to determine whether physicians spend less time with obese patients. Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache. They were then asked to estimate how long they would spend with the patient. The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of average weight. The chart a particular physician viewed was determined randomly. Thirty-three physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients.

The mean time physicians reported that they would spend with obese patients was 24.7 minutes as compared to a mean of 31.4 minutes for normal-weight patients. How might this difference between means have occurred? One possibility is that physicians were influenced by the weight of the patients. On the other hand, perhaps by chance, the physicians who viewed charts of the obese patients tend to see patients for less time than the other physicians. Random assignment of charts does not ensure that the groups will be equal in all respects other than the chart they viewed. In fact, it is certain the groups differed in many ways by chance. The two groups could not have exactly the same mean age (if measured precisely enough such as in days). Perhaps a physician’s age affects how long the physician sees patients. There are innumerable differences between the groups that could affect how long they view patients. With this in mind, is it plausible that these chance differences are responsible for the difference in times?

To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference (31.4 − 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance. Using methods presented in later chapters, this probability can be computed to be .0057. Since this is such a low probability, we have confidence that the difference in times is due to the patient’s weight and is not due to chance.

The Probability Value

It is very important to understand precisely what the probability values mean. In the James Bond example, the computed probability of .0106 is the probability he would be correct on 13 or more taste tests (out of 16) if he were just guessing. It is easy to mistake this probability of .0106 as the probability he cannot tell the difference. This is not at all what it means.

The probability of .0106 is the probability of a certain outcome (13 or more out of 16) assuming a certain state of the world (James Bond was only guessing). It is not the probability that a state of the world is true. Although this might seem like a distinction without a difference, consider the following example. An animal trainer claims that a trained bird can determine whether or not numbers are evenly divisible by 7. In an experiment assessing this claim, the bird is given a series of 16 test trials. On each trial, a number is displayed on a screen and the bird pecks at one of two keys to indicate its choice. The numbers are chosen in such a way that the probability of any number being evenly divisible by 7 is .50. The bird is correct on 9/16 choices. We can compute that the probability of being correct nine or more times out of 16 if one is only guessing is .40. Since a bird who is only guessing would do this well 40% of the time, these data do not provide convincing evidence that the bird can tell the difference between the two types of numbers. As a scientist, you would be very skeptical that the bird had this ability. Would you conclude that there is a .40 probability that the bird can tell the difference? Certainly not! You would think the probability is much lower than .0001.

To reiterate, the probability value is the probability of an outcome (9/16 or better) and not the probability of a particular state of the world (the bird was only guessing). In statistics, it is conventional to refer to possible states of the world as hypotheses since they are hypothesized states of the world. Using this terminology, the probability value is the probability of an outcome given the hypothesis. It is not the probability of the hypothesis given the outcome.

This is not to say that we ignore the probability of the hypothesis. If the probability of the outcome given the hypothesis is sufficiently low, we have evidence that the hypothesis is false. However, we do not compute the probability that the hypothesis is false. In the James Bond example, the hypothesis is that he cannot tell the difference between shaken and stirred martinis. The probability value is low (.0106), thus providing evidence that he can tell the difference. However, we have not computed the probability that he can tell the difference.

The Null Hypothesis

The hypothesis that an apparent effect is due to chance is called the null hypothesis , written H 0 (“ H -naught”). In the Physicians’ Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:

introduction to hypothesis testing assignment

The null hypothesis in a correlational study of the relationship between high school grades and college grades would typically be that the population correlation is 0. This can be written as

introduction to hypothesis testing assignment

Although the null hypothesis is usually that the value of a parameter is 0, there are occasions in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the U.S., as our null value and test for differences against that.

For now, we will focus on testing a value of a single mean against what we expect from the population. Using birth weight as an example, our null hypothesis takes the form:

introduction to hypothesis testing assignment

Keep in mind that the null hypothesis is typically the opposite of the researcher’s hypothesis. In the Physicians’ Reactions study, the researchers hypothesized that physicians would expect to spend less time with obese patients. The null hypothesis that the two types of patients are treated identically is put forward with the hope that it can be discredited and therefore rejected. If the null hypothesis were true, a difference as large as or larger than the sample difference of 6.7 minutes would be very unlikely to occur. Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients.

In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relationship between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups. If we are trying to predict job performance, we want to find a relationship between conscientiousness and evaluation scores. However, until we have evidence against it, we must use the null hypothesis as our starting point.

The Alternative Hypothesis

If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, H A or H 1 . The alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form

introduction to hypothesis testing assignment

based on the research question itself. We should only use a directional hypothesis if we have good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative:

introduction to hypothesis testing assignment

We will set different criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative. To understand why, we need to see where our criteria come from and how they relate to z  scores and distributions.

Critical Values, p Values, and Significance Level

alpha

The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis; if not, we fail to reject the null (we never “accept” the null).

Figure 7.1. The rejection region for a one-tailed test. (“ Rejection Region for One-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

The rejection region is bounded by a specific z  value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value , z crit  (“ z  crit”), or z * (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z  score corresponding to any area under the curve as we did in Unit 1 . If we go to the normal table, we will find that the z  score corresponding to 5% of the area under the curve is equal to 1.645 ( z = 1.64 corresponds to .0505 and z = 1.65 corresponds to .0495, so .05 is exactly in between them) if we go to the right and −1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing and shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For a = .05, this means 2.5% of the area is in each tail, which, based on the z  table, corresponds to critical values of z * = ±1.96. This is shown in Figure 7.2 .

Figure 7.2. Two-tailed rejection region. (“ Rejection Region for Two-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

Thus, any z  score falling outside ¹1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z  scores in this way, the obtained value of z (sometimes called z  obtained and abbreviated z obt ) is something known as a test statistic , which is simply an inferential statistic used to test a null hypothesis. The formula for our z  statistic has not changed:

introduction to hypothesis testing assignment

Figure 7.3. Relationship between a , z obt , and p . (“ Relationship between alpha, z-obt, and p ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

When the null hypothesis is rejected, the effect is said to have statistical significance , or be statistically significant. For example, in the Physicians’ Reactions case study, the probability value is .0057. Therefore, the effect of obesity is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is important to keep in mind that statistical significance means only that the null hypothesis of exactly no effect is rejected; it does not mean that the effect is important, which is what “significant” usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is significant does not tell you about how large or important the effect is.

Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

The Hypothesis Testing Process

A four-step procedure.

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook and course, and although the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above and in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Step 3: calculate the test statistic and effect size.

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic—in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same. As part of this step, we will also calculate effect size to better quantify the magnitude of the difference between our groups. Although effect size is not considered part of hypothesis testing, reporting it as part of the results is approved convention.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example A Movie Popcorn

Our manager is looking for a difference in the mean weight of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

introduction to hypothesis testing assignment

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that a = .05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z  test at a = .05 are z * = ¹1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution, as shown in Figure 7.4 , so we can visualize the rejection region and make sure it makes sense.

Figure 7.4. Rejection region for z * = ±1.96. (“ Rejection Region z+-1.96 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average weight of this employee’s popcorn bags is M = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z :

introduction to hypothesis testing assignment

So our test statistic is z = −2.50, which we can draw onto our rejection region distribution as shown in Figure 7.5 .

Figure 7.5. Test statistic location. (“ Test Statistic Location z-2.50 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect size gives us an idea of how large, important, or meaningful a statistically significant effect is. For mean differences like we calculated here, our effect size is Cohen’s d :

introduction to hypothesis testing assignment

This is very similar to our formula for z , but we no longer take into account the sample size (since overly large samples can make it too easy to reject the null). Cohen’s d is interpreted in units of standard deviations, just like z . For our example:

introduction to hypothesis testing assignment

Cohen’s d is interpreted as small, moderate, or large. Specifically, d = 0.20 is small, d = 0.50 is moderate, and d = 0.80 is large. Obviously, values can fall in between these guidelines, so we should use our best judgment and the context of the problem to make our final interpretation of size. Our effect size happens to be exactly equal to one of these, so we say that there is a moderate effect.

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Any time you perform a hypothesis test, whether statistically significant or not, you should always calculate and report effect size.

Looking at Figure 7.5 , we can see that our obtained z  statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, −2.50 > −1.96, so we reject the null hypothesis. We can now write our conclusion:

Reject H 0 . Based on the sample of 25 bags, we can conclude that the average popcorn bag from this employee is smaller ( M = 7.75 cups) than the average weight of popcorn bags at this movie theater, and the effect size was moderate, z = −2.50, p < .05, d = 0.50.

Example B Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degrees Fahrenheit during the summer months but is allowed to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

introduction to hypothesis testing assignment

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

introduction to hypothesis testing assignment

You know that the most common level of significance is a  = .05, so you keep that the same and know that the critical value for a one-tailed z  test is z * = 1.645. To keep track of the directionality of the test and rejection region, you draw out your distribution as shown in Figure 7.6 .

Figure 7.6. Rejection region. (“ Rejection Region z1.645 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

Now that you have everything set up, you spend one week collecting temperature data:

introduction to hypothesis testing assignment

This value falls so far into the tail that it cannot even be plotted on the distribution ( Figure 7.7 )! Because the result is significant, you also calculate an effect size:

introduction to hypothesis testing assignment

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Figure 7.7. Obtained z statistic. (“ Obtained z5.77 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

introduction to hypothesis testing assignment

You compare your obtained z  statistic, z = 5.77, to the critical value, z * = 1.645, and find that z > z *. Therefore you reject the null hypothesis, concluding:

Reject H 0 . Based on 5 observations, the average temperature ( M = 76.6 degrees) is statistically significantly higher than it is supposed to be, and the effect size was large, z = 5.77, p < .05, d = 2.60.

Example C Different Significance Level

Finally, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, a = .01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value:

introduction to hypothesis testing assignment

We will assume a two-tailed test:

introduction to hypothesis testing assignment

We have seen the critical values for z  tests at a = .05 levels of significance several times. To find the values for a = .01, we will go to the Standard Normal Distribution Table and find the z  score cutting off .005 (.01 divided by 2 for a two-tailed test) of the area in the tail, which is z * = ¹2.575. Notice that this cutoff is much higher than it was for a = .05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic. We will use s = 10 as our known population standard deviation and the following data to calculate our sample mean:

introduction to hypothesis testing assignment

The average of these scores is M = 60.40. From this we calculate our z  statistic as:

introduction to hypothesis testing assignment

The Cohen’s d effect size calculation is:

introduction to hypothesis testing assignment

Our obtained z  statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Fail to reject H 0 . Based on the sample of 10 scores, we cannot conclude that there is an effect causing the mean ( M  = 60.40) to be statistically significantly different from 60.00, z = 0.13, p > .01, d = 0.04, and the effect size supports this interpretation.

Other Considerations in Hypothesis Testing

There are several other considerations we need to keep in mind when performing hypothesis testing.

Errors in Hypothesis Testing

In the Physicians’ Reactions case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type I error. More generally, a Type I error occurs when a significance test results in the rejection of a true null hypothesis.

The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error . Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.

A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called b (“beta”). The probability of correctly rejecting a false null hypothesis equals 1 − b and is called statistical power . Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the null).

Misconceptions in Hypothesis Testing

Misconceptions about significance testing are common. This section lists three important ones.

  • Misconception: The probability value ( p value) is the probability that the null hypothesis is false. Proper interpretation: The probability value ( p value) is the probability of a result as extreme or more extreme given that the null hypothesis is true. It is the probability of the data given the null hypothesis. It is not the probability that the null hypothesis is false.
  • Misconception: A low probability value indicates a large effect. Proper interpretation: A low probability value indicates that the sample outcome (or an outcome more extreme) would be very unlikely if the null hypothesis were true. A low probability value can occur with small effect sizes, particularly if the sample size is large.
  • Misconception: A non-significant outcome means that the null hypothesis is probably true. Proper interpretation: A non-significant outcome means that the data do not conclusively demonstrate that the null hypothesis is false.
  • In your own words, explain what the null hypothesis is.
  • What are Type I and Type II errors?
  • Why do we phrase null and alternative hypotheses with population parameters and not sample means?
  • Why do we state our hypotheses and decision criteria before we collect our data?
  • Why do you calculate an effect size?
  • z = 1.99, two-tailed test at a = .05
  • z = 0.34, z * = 1.645
  • p = .03, a = .05
  • p = .015, a = .01

Answers to Odd-Numbered Exercises

Your answer should include mention of the baseline assumption of no difference between the sample and the population.

Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.

We always calculate an effect size to see if our research is practically meaningful or important. NHST (null hypothesis significance testing) is influenced by sample size but effect size is not; therefore, they provide complimentary information.

introduction to hypothesis testing assignment

“ Null Hypothesis ” by Randall Munroe/xkcd.com is licensed under CC BY-NC 2.5 .)

introduction to hypothesis testing assignment

Introduction to Statistics in the Psychological Sciences Copyright © 2021 by Linda R. Cote Ph.D.; Rupa G. Gordon Ph.D.; Chrislyn E. Randell Ph.D.; Judy Schmitt; and Helena Marvin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Statology

Statistics Made Easy

Introduction to Hypothesis Testing

A statistical hypothesis is an assumption about a population parameter .

For example, we may assume that the mean height of a male in the U.S. is 70 inches.

The assumption about the height is the statistical hypothesis and the true mean height of a male in the U.S. is the population parameter .

A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis.

The Two Types of Statistical Hypotheses

To test whether a statistical hypothesis about a population parameter is true, we obtain a random sample from the population and perform a hypothesis test on the sample data.

There are two types of statistical hypotheses:

The null hypothesis , denoted as H 0 , is the hypothesis that the sample data occurs purely from chance.

The alternative hypothesis , denoted as H 1 or H a , is the hypothesis that the sample data is influenced by some non-random cause.

Hypothesis Tests

A hypothesis test consists of five steps:

1. State the hypotheses. 

State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false.

2. Determine a significance level to use for the hypothesis.

Decide on a significance level. Common choices are .01, .05, and .1. 

3. Find the test statistic.

Find the test statistic and the corresponding p-value. Often we are analyzing a population mean or proportion and the general formula to find the test statistic is: (sample statistic – population parameter) / (standard deviation of statistic)

4. Reject or fail to reject the null hypothesis.

Using the test statistic or the p-value, determine if you can reject or fail to reject the null hypothesis based on the significance level.

The p-value  tells us the strength of evidence in support of a null hypothesis. If the p-value is less than the significance level, we reject the null hypothesis.

5. Interpret the results. 

Interpret the results of the hypothesis test in the context of the question being asked. 

The Two Types of Decision Errors

There are two types of decision errors that one can make when doing a hypothesis test:

Type I error: You reject the null hypothesis when it is actually true. The probability of committing a Type I error is equal to the significance level, often called  alpha , and denoted as Îą.

Type II error: You fail to reject the null hypothesis when it is actually false. The probability of committing a Type II error is called the Power of the test or  Beta , denoted as Î˛.

One-Tailed and Two-Tailed Tests

A statistical hypothesis can be one-tailed or two-tailed.

A one-tailed hypothesis involves making a “greater than” or “less than ” statement.

For example, suppose we assume the mean height of a male in the U.S. is greater than or equal to 70 inches. The null hypothesis would be H0: µ ≥ 70 inches and the alternative hypothesis would be Ha: µ < 70 inches.

A two-tailed hypothesis involves making an “equal to” or “not equal to” statement.

For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. The null hypothesis would be H0: µ = 70 inches and the alternative hypothesis would be Ha: µ ≠ 70 inches.

Note: The “equal” sign is always included in the null hypothesis, whether it is =, â‰Ľ, or â‰¤.

Related:   What is a Directional Hypothesis?

Types of Hypothesis Tests

There are many different types of hypothesis tests you can perform depending on the type of data you’re working with and the goal of your analysis.

The following tutorials provide an explanation of the most common types of hypothesis tests:

Introduction to the One Sample t-test Introduction to the Two Sample t-test Introduction to the Paired Samples t-test Introduction to the One Proportion Z-Test Introduction to the Two Proportion Z-Test

' src=

Published by Zach

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Logo for OPEN OCO

9 Introduction to Hypothesis Testing

Learning outcomes

In this chapter, you will learn how to:

  • Identify the components of a hypothesis test.
  • State the hypotheses and identify appropriate critical areas.
  • Describe the proper interpretations of a p -value as well as common misinterpretations.
  • Distinguish between the two types of error in hypothesis testing.
  • Conduct a hypothesis test using a z-score statistics.
  • Explain the purpose of measuring effect size.
  • Compute to Cohen’s d.
  • Identify the assumption underlying a test statistic.

In the first unit we discussed the three major goals of statistics:

The last two goals are related to the idea of hypothesis testing and inferential statistics, while the first is clearly tied to descriptive statistics. The remaining chapters will cover many different kinds of hypothesis tests connected to different inferential statistics.  There is a lot of new language we will learn about when conducting a hypothesis test. Some of the components of a hypothesis test are the topics we are already familiar with: probability and distribution of sample means.

Hypothesis testing is an inferential procedure that uses data from a sample to draw a general conclusion about a population. When interpreting a research question and statistical results, a natural question arises as to whether the finding could have occurred by chance. In this chapter, we will introduce the ideas behind the use of statistics to make decisions – in particular, decisions about whether a particular hypothesis is supported by the data.

Logic and Purpose of Hypothesis Testing

Let’s consider an example. The case study Physicians’ Reactions sought to determine whether physicians spend less time with obese patients. Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache. They were then asked to estimate how long they would spend with the patient. The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of average weight. The chart a particular physician viewed was determined randomly. Thirty-three physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients.

The mean time physicians reported that they would spend with obese patients was 24.7 minutes as compared to a mean of 31.4 minutes for normal-weight patients. How might this difference between means have occurred? One possibility is that physicians were influenced by the weight of the patients. On the other hand, perhaps by chance, the physicians who viewed charts of the obese patients tend to see patients for less time than the other physicians. Random assignment of charts does not ensure that the groups will be equal in all respects other than the chart they viewed. In fact, it is certain the groups differed in many ways by chance. The two groups could not have exactly the same mean age. Perhaps a physician’s age affects how long physicians see patients. There are many differences between the groups that could affect how long they view patients. With this in mind, is it plausible that these chance differences are responsible for the difference in times?

To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference (31.4 – 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance. Using methods presented in later chapters, this probability can be computed to be 0.0057. Since this is such a low probability, we have confidence that the difference in times is due to the patient’s weight and is not due to chance.

In hypothesis testing, like the example described above, we see these four components:

  • We create two hypotheses: the null and the alternative.
  • We collect and analyze data.
  • We determine how likely or unlikely the original hypothesis is to occur based on probability.
  • We determine if we have enough evidence to support or reject the null hypothesis and draw conclusions.

Now let’s bring in some specific terminology.

Null hypothesis

In general, the null hypothesis , written H 0 (“H-naught”), is the idea that nothing is going on: there is no effect of our treatment, no relation between our variables, and no difference in our sample mean from what we expected about the population mean. The null hypothesis indicates that an apparent effect is due to chance. This is always our baseline starting assumption, and it is what we (typically) seek to reject .

In the Physicians’ Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:

H 0 : Îź obese = Îź average

Alternative hypothesis

If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis , H A . The alternative hypothesis is simply the opposite of the null hypothesis. Thus, our alternative hypothesis is the mathematical way of stating our research question.  In general, the alternative hypothesis is that there is a significant effect of the treatment, significant relationship between variables, or significant difference between groups. The alternative hypothesis essentially shows evidence the findings are not due to chance.  It is also called the research hypothesis as this is the most common outcome a researcher is looking for: evidence of change, differences, or relationships. There are three options for setting up the alternative hypothesis, depending on where we expect the difference to lie. The alternative hypothesis always involves some kind of inequality (≠ “not equal”, > “greater than”, or < “less than”).

  • If we expect a specific direction of change/differences/relationships, which we call a directional hypothesis , then our alternative hypothesis takes the form based on the research question itself.  The directional hypothesis (2 directions) makes up 2 of the 3 alternative hypothesis options. 
  • The other alternative is to state there are differences/changes, or a relationship but not predict the direction. We use a non-directional hypothesis   (typically see ≠ for mathematical notation).

In the Physicians’ Reactions example, the directional alternative hypothesis must go in one or the other direction (i.e., more or less time with the obese patients). Based on our research question, the directional alternative hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is less than the mean time expected to be spent with average-weight patients. This alternative null hypothesis can be written as:

H A : Îź obese < Îź average

In the Physicians’ Reactions example, the non-directional alternative hypothesis simply states that there is difference in time spent with obese patients compared to average-weight patients. This alternative null hypothesis can be written as:

H A : μ obese ≠ μ average

Critical values, p-values, and significance level

A low probability value casts doubt on the null hypothesis. How low must the probability value be in order to conclude that the null hypothesis is very unlikely? Although there is clearly no right or wrong answer to this question, it is conventional to say the null hypothesis is false if the probability value is less than 0.05. More conservative researchers conclude the null hypothesis is false only if the probability value is less than 0.01. When a researcher concludes that the null hypothesis is false, the researcher is said to have “rejected the null hypothesis.” The probability value below which the null hypothesis is rejected is called the Îą level or simply Îą (“alpha”). It is also called the significance level. If Îą is not explicitly specified, we often assume that Îą = 0.05.

The significance level (AKA alpha level) is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis; if not, we fail to reject the null (we never “accept” the null).

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effect. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to Îą: if we use our normal criterion of Îą = .05, then 5% of the area under the curve becomes what we call the critical region of the distribution. This is illustrated in Figure 1.

image

Figure 1. The critical region for a one-tailed test

The shaded rejection region takes up 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The critical region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value , z crit (“z-crit”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0505 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z crit = ±1.96. This is shown in Figure 2.

image

Figure 2. Two-tailed critical region

Thus, any z-score falling outside Âą1.96 (greater than 1.96 in absolute value) falls in the critical region.

To formally test our hypothesis, we must be able to compare an obtained z-statistic to our critical z-value. We call this calculated or obtained z-score our z-test (z test ). If |z test | > |z crit | then our sample falls in the critical region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2), and so we reject H 0 . If |z test | < |z crit |, then we fail to reject H 0 .

Calculating our Test Statistic

[latex]z_{test}=\frac{M-μ_M}{𝜎_M}[/latex]

Remember, according to the definition of the distribution of sample means:

[latex]Îź_M=Îź[/latex]

[latex]𝜎_M=\frac{𝜎}{\sqrt{n}}[/latex]

The test statistic is very useful when we are doing our calculations by hand. However, when we use computer software, it will report to us a p-value, which is simply the proportion of the area under the curve in the tails beyond our obtained test statistic. We can directly compare this p-value to Îą to test our null hypothesis: if p < Îą, we reject H 0 , but if p > Îą, we fail to reject H 0 .

Thus, as z test gets farther from zero, the corresponding area under the curve beyond z test gets smaller. Thus, the proportion, or p-value, will be smaller than the area for Îą, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

When the null hypothesis is rejected, the effect is said to be statistically significant . For example, in the Physicians Reactions case study, the probability value is 0.0057 ( p = .0057, which means that p < .05). Therefore, the effect of obesity on physicians’ reactions is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is very important to keep in mind that statistical significance means only that the null hypothesis no effect is rejected; it does not mean that the effect is important, which is what “significant” usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is statistically significant does not tell you about how large or important the effect is.

Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is most likely real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook, and although the hypotheses and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first things you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: Îą, which determines how much of the area under the curve composes our critical region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make and Interpret the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis.

When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained (in APA format usually).

Example: Movie Popcorn

Let’s see hypothesis testing in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ = 0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled. This scenario has all of the information we need to begin our hypothesis testing procedure.

First, we need to decide if we are looking for directional or non-directional hypotheses. In the scenario outlined above, the manager is interested in examining whether bags were “overfilled or underfilled” which implies he wants to look at both directions, which means he’s not looking for a particular direction. Thus, this hypothesis test is non-directional. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the weight of popcorn bags from this employee

H 0 : Îź = 8.00

Notice that we phrase the hypothesis in terms of the population parameter Ο, which in this case would be the true average weight of bags filled by the new employee. Our assumption of no difference, the null hypothesis, is that this mean is exactly the same as the known population mean value we want it to match, 8.00.

Now let’s do the alternative:

H A : There is a difference in the weight of popcorn bags from this employee

H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed (non-directional) alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the significance level or alpha. (Remember, traditional alpha levels are .05, .01, and .001.) We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we will assume that Îą = 0.05. As stated earlier in the chapter, the critical values for a two-tailed z-test at Îą = 0.05 are z crit = Âą1.96. We can now draw out our distribution (see Figure 3) so we can visualize the critical region and make sure it makes sense.

image

Figure 3. Critical region for z crit = Âą1.96 shaded in red

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average weight of this employee’s n = 25 popcorn bags is M = 7.75 cups.

We can now plug this value, along with the values presented in the original problem, into our equation for z test :

[latex]z_{test}=\frac{M-Ο}{σ_M}=\frac{M-Ο}{\frac{σ}{\sqrt{n}}}[/latex]

[latex]z_{test}=\frac{7.75-8.00}{\frac{0.5}{\sqrt{25}}}[/latex]

[latex]z_{test}=\frac{-0.25}{\frac{0.5}{5}}=\frac{-0.25}{0.10}=-2.50[/latex]

Our test statistic is z test = -2.50, which we can now draw onto our critical region (see Figure 4).

image

Figure 4. Test statistic location compared to the critical region

Looking at Figure 4, we can see that our obtained z-test falls in the critical region. We can also directly compare it to our critical value: in terms of absolute value, |-2.50| > |-1.96|, so we reject the null hypothesis. We can now write our conclusion:

Reject H 0 . Based on the sample of 25 bags, we can conclude that the average popcorn bag from this employee is significantly smaller ( M = 7.75 cups) than the average weight of popcorn bags at this movie theater, z = – 2.50, p < 0.05.

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the descriptive statistics like we’ve seen before) and the test statistic and p-value (the inferential statistics we are now learning to calculate). We don’t know the exact p-value, but we do know that because it was in the critical region and we rejected the null, it must be less than Îą.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of the difference, relationship, or effect that we found, we can compute a new inferential statistic called an effect size . Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is. For the difference between two means as in a z-statistic hypothesis test like we calculated here, our effect size is Cohen’s d.

Calculating Cohen’s d

[latex]d=\bigg |\frac{M-μ}{𝜎}\bigg |[/latex]

Note: Because we don’t care about the direction of Cohen’s d, we take the absolute value of the calculation.

This is very similar to our formula for z, but we no longer take into account the sample size (since overly large samples can make it too easy to reject the null). Cohen’s d is interpreted in units of standard deviations, just like a z-score. For our hypothesis test example above, we rejected the null hypothesis, so we should next ask ourselves, how big of an effect or difference was there? Let’s find out.

[latex]d=\bigg |\frac{7.75-8.00}{0.50}\bigg |=\bigg |\frac{-0.25}{0.50}\bigg |=|-0.50|=0.50[/latex]

Cohen’s d is interpreted as small, moderate, or large. Specifically, d = 0.20 is small, d = 0.50 is moderate, and d = 0.80 is large. Obviously values can fall in between these guidelines, so we should use our best judgment and the context of the problem to make our final interpretation of size. Our effect size happened to be exactly equal to one of these, so we say that there was a moderate effect.

Table 1. Interpretation of Cohen’s d

Now we can update the report of our findings from above.

Reject H 0 . Based on the sample of 25 bags, we can conclude that the average popcorn bag from this employee is significantly smaller ( M = 7.75 cups) than the average weight of popcorn bags at this movie theater, z = – 2.50, p < 0.05, d = 0.50.

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate and report an effect size.

Other Considerations in Hypothesis Testing

There are several other considerations we need to keep in mind when performing hypothesis testing.

Errors in Hypothesis Testing

In the Physicians’ Reactions case study, the probability value associated with the significance test is 0.0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis is actually true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type I error . More generally, a Type I error occurs when a hypothesis test results in the rejection of a true null hypothesis.

The Type I error rate is affected by the Îą level: the lower the Îą level the lower the Type I error rate. It might seem that Îą is the probability of a Type I error. However, this is not correct. Instead, Îą is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error.

The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error . Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.

A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called β (beta). The probability of correctly rejecting a false null hypothesis equals 1- β and is called power . Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I Error), and the sample size used (larger samples make it easier to reject the null).

Summing up, when you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H 0 and the decision to reject or not. The outcomes are summarized in the following table:

Table 2. The four possible outcomes in hypothesis testing.

  • The decision is not to reject H 0 when H 0 is true (correct decision).
  • The decision is to reject H 0 when H 0 is true (incorrect decision known as a Type I error ).
  • The decision is not to reject H 0 when, in fact, H 0 is false (incorrect decision known as a Type II error ).
  • The decision is to reject H 0 when H 0 is false ( correct decision ).

Misconceptions in Hypothesis Testing

Misconceptions about hypothesis testing are common. This section lists three important ones.

Misconception #1: The probability value is the probability that the null hypothesis is false.

False. The probability value is the probability of a result as extreme or more extreme given that the null hypothesis is true. It is the probability of the data given the null hypothesis. It is not the probability that the null hypothesis is false.

Misconception#2: A low probability value indicates a large effect.

False. A low probability value indicates that the sample outcome (or one more extreme) would be very unlikely if the null hypothesis were true. A low probability value can occur with small effect sizes, particularly if the sample size is large.

Misconception #3: A non-significant outcome means that the null hypothesis is probably true.

False. A non-significant outcome means that the data do not conclusively demonstrate that the null hypothesis is false.

Misconception #4: A significant outcome means that you have proven your alternative hypothesis to be true.

False. A significant outcome means that you have found evidence to support your alternative hypothesis. We NEVER prove anything to be true! A future study may find something different.

Test Statistic Assumptions

There is one last consideration we will revisit with each test statistic throughout the book–assumptions.  There are four main assumptions. These assumptions are often taken for granted in using prescribed data in a statistics course.  In the real world, these assumptions would need to be examined and tested using statistical software.

Random Sampling

A sample is random when each person (or animal) in your population has an equal chance of being included in the sample; therefore selection of any individual happens by chance, rather than by choice. This reduces the chance that differences in characteristics or conditions may bias results. Remember that random samples are more likely to be representative of the population so researchers can be more confident interpreting the results. Note: there is no test that statistical software can perform which assures random sampling has occurred, but following good sampling techniques helps to ensure your samples are random.

Independence

Statistical independence is a critical assumption for many statistical tests. It is assumed that observations are independent of each other often but often this assumption. Is not met. Independence means the value of one observation does not influence or affect the value of other observations. Independent data items are not connected with one another in any way (unless you account for it in your study). Even the smallest dependence in your data can turn into heavily biased results (which may be undetectable) if you violate this assumption. Note: there is no test statistical software can perform that assures independence of the data because this should be addressed during the research planning phase. Using a non-parametric test is often recommended if a researcher is concerned this assumption has been violated.

Normality assumes that the continuous variables (dependent variable) used in the analysis are normally distributed. Normal distributions are symmetric around the center (the mean) and form a bell-shaped distribution. Normality is violated when sample data are skewed. With large enough sample sizes ( n > 30) the violation of the normality assumption should not cause major problems (remember the central limit theorem) but there is a feature in most statistical software that can alert researchers to an assumption violation.

Equality of Variances AKA Homoscedasticity

Variance refers to the spread of scores from the mean. Many statistical tests assume that although different samples can come from populations with different means, they have the same variance. Equality of variances (i.e., homogeneity of variance) is violated when variances across different groups or samples are significantly different. Note: there is a feature in most statistical software to test for this.

a statistical inference procedure for determining whether a given proposition about a population parameter should be rejected on the basis of observed sample data.

a statement that a study will find no meaningful differences between the groups or conditions under investigation.

a statement that a study will find meaningful differences or relationships between the variables under investigation.

a scientific prediction stating (a) that an effect will occur and (b) whether that effect will specifically increase or specifically decrease, depending on changes to the independent variable.

a scientific prediction stating that an effect, difference or relationship will occur but does not predict if it will be an increase or decrease.

the portion of a probability distribution containing the values for a test statistic that would result in rejection of a null hypothesis in favor of its corresponding alternative hypothesis.

a value used to make decisions about whether a test result is statistically meaningful.

the numerical result of a statistical test, which is used to determine statistical significane and evaluate the viability of a hypothesis.

the degree to which a research outcome cannot reasonably be attributed to the operation of chance or random factors. It is determined during significance testing and given by a critical  p value, which is the probability of obtaining the observed data if the null hypothesis (i.e., of no significant relationship between variables) were true.

A measure of the magnitude or meaningfulness of a difference, relationship, or effect in a study. Often, effect sizes are interpreted as indicating the practical or meaningful significance of a research finding.

When a researcher rejects the null hypothesis when it should not be rejected. Researchers make this error when they believe they have found an effect, difference, or relationship that does not actually exist. The probability of committing a Type I error is called the significance level or alpha (Îą) level.

When researchers fail to reject the null hypothesis when it is should be rejected. Researchers make this error if they conclude that a particular effect, difference, or relationship does not exist when in fact it does. The probability of committing a Type II error is called the beta (β) level of a test. Conversely, the probability of not committing a Type II error (i.e., of detecting a genuinely significant difference, effect, or relationship) is called the power of the test, where power = 1 – β.

A statistical measure of how effective a statistical procedure is at identifying real differences in a study. It is the probability that use of the procedure will lead to the null hypothesis being rejected.

the condition in which the occurrence of one event makes it neither more nor less probable that another event will occur.

the condition in which a data set presents a normal distribution of values.

the statistical assumption of equal variance, meaning that the average squared distance of a score from the mean is the same across all groups sampled in a study.

Introduction to Statistics for the Social Sciences Copyright © 2021 by Jennifer Ivie; Alicia MacKay is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

logo

Introduction to Data Science I & II

Hypothesis testing, hypothesis testing #.

Hypothesis testing is about choosing between two views, called hypotheses , on how data were generated (for example, SSR=1 or SSR =1.05 where SSR is the secondary sex ratio we defined in the previous section). Hypotheses, called null and alternative , should be specified before doing the analysis. Testing is a way to select the hypothesis that is better supported by the data. Note that the null hypothesis corresponds to what we called “the data-generating model” in the previous section.

Ingredients of hypothesis testing:

A null hypothesis \(H_0\) (e.g. SSR=1.05);

An alternative hypothesis \(H_A\) (e.g. SSR \(\neq\) 1.05);

A test statistic;

A decision or a measure of significance that is obtained from the test statistic and its distribution under the null hypothesis.

In the previous section we investigated \(H_0\) : SSR=1 by simulating from the binomial distribution. The natural alternative there was \(H_A:\) SSR \(\neq\) 1. The test statistic we used was the number of boys.

Let’s look in more detail at the components of a hypothesis test on a subset of these data. Assume that someone claims that Illinois has a different SSR based on what they have seen in a hospital they work for. You decide to investigate this using the natality data we introduced above. Before looking at the data, you need to decide on the first three ingredients:

Null hypothesis is generally the default view (generally believed to be true) and it all needs to lead to clear rules on how the data were generated. In this case, it makes sense to declare that \(H_0:\) SSR_IL=1.05 (the probability of having a boy in Illinois is 0.512 which corresponds to a secondary sex ratio of 1.05).

Alternative hypothesis should be the opposite of the null, but it can have variations (for example, we can use SSR_IL<1.05 or SSR_IL>1.05). Here, because there was no additional information provided, it is natural to use \(H_A:\) SSR_IL \(\neq\) 1.05. Note that the choice of alternative will impact the measure of significance that is discussed below.

Test statistic is the summary of the data that will be used for investigating consistency. We aim to choose the statistic that is most informative for the hypotheses we are investigating. We will use here the observed SSR in IL as a test statistic.

Below are the cells that show the data and the histogram of test statistics generated under \(H_0\) .

../../_images/HypothesisTesting_2_Test_3_0.png

In the above histogram, the observed statistic (indicated by the red dot) seems to be natural realization from the distribution summarized by the histogram. There seems to be no evidence against \(H_0\) . A more complete conclusion: using the number of boys as test statistic, we find no evidence to reject \(H_0\) (no evidence that the data is inconsistent with SSR=1.05).

Significance as measured by the p-value #

P-values capture the consistency of the data (test statistic) with the null hypothesis (distribution of the statistic under the null).

The p-value is the chance, under the null hypothesis , that the test statistic is equal to the observed value or is further in the direction of the alternative.

It is important to use correctly the specified alternative hypothesis for specifying the tail or tails of the null distribution of the statistic.

The decision is made using the null distribution of the test statistics ( probability distribution ); we will use an approximation given by an empirical distribution . P-value is about the tail area of the distribution.

In the above example, the proportion of simulations that lead to more extreme values than the one observed is:

Interpretation of p-values #

When \(H_0\) is true : p-value is (approximately) distributed uniform on the interval [0,1]:

about half of p-values are larger than 0.5

about 10% are smaller than 0.1

about 5% are smaller than 0.05

A small p-value (typically smaller than 0.05 or 0.01) indicates evidence against the null hypothesis (smaller the p-value, stronger the evidence). A large p-value indicates no evidence (or weak evidence) against the null.

The Logo and Seal of the Freie Universität Berlin

Department of Earth Sciences

Service navigation.

  • SOGA Startpage
  • Privacy Policy
  • Accessibility Statement

Statistics and Geodata Analysis using R (SOGA-R)

Path Navigation

  • Basics of Statistics
  • Hypothesis Tests

Introduction to Hypothesis Testing

  • Online Degree Explore Bachelor’s & Master’s degrees
  • MasterTrack™ Earn credit towards a Master’s degree
  • University Certificates Advance your career with graduate-level learning
  • Top Courses
  • Join for Free

SAS

Introduction to Statistical Analysis: Hypothesis Testing

This course is part of SAS Statistical Business Analyst Professional Certificate

Taught in English

Jordan Bakerman

Instructor: Jordan Bakerman

Financial aid available

16,559 already enrolled

Coursera Plus

(109 reviews)

Skills you'll gain

  • Multivariate Time Series Analysis
  • Multivariate Analysis
  • Multivariate Statistics
  • Predictive Modelling

Details to know

introduction to hypothesis testing assignment

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Placeholder

Build your Data Analysis expertise

  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate from SAS

Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 4 modules in this course

This introductory course is for SAS software users who perform statistical analyses using SAS/STAT software. The focus is on t tests, ANOVA, and linear regression, and includes a brief introduction to logistic regression.

Course Overview and Data Setup

In this module you learn about the course and the data you analyze in this course. Then you set up the data you need to do the practices in the course.

What's included

2 videos 5 readings

2 videos • Total 12 minutes

  • Welcome and Meet the Instructor • 1 minute • Preview module
  • Demo: Exploring Ames Housing Data • 10 minutes

5 readings • Total 52 minutes

  • Learner Prerequisites • 1 minute
  • Access SAS Software for this Course • 10 minutes
  • Follow These Instructions to Set Up Data for This Course (REQUIRED) • 30 minutes
  • Completing Demos and Practices • 10 minutes
  • Using Forums and Getting Help • 1 minute

Introduction and Review of Concepts

In this module you learn about the models required to analyze different types of data and the difference between explanatory vs predictive modeling. Then you review fundamental statistical concepts, such as the sampling distribution of a mean, hypothesis testing, p-values, and confidence intervals. After reviewing these concepts, you apply one-sample and two-sample t tests to data to confirm or reject preconceived hypotheses.

17 videos 2 readings 9 quizzes

17 videos • Total 41 minutes

  • Overview • 1 minute • Preview module
  • Statistical Modeling: Types of Variables • 1 minute
  • Overview of Models • 3 minutes
  • Explanatory versus Predictive Modeling • 1 minute
  • Population Parameters and Sample Statistics • 1 minute
  • Normal (Gaussian) Distribution • 2 minutes
  • Standard Error of the Mean • 0 minutes
  • Confidence Intervals • 2 minutes
  • Statistical Hypothesis Test • 4 minutes
  • p-Value: Effect Size and Sample Size Influence • 3 minutes
  • Scenario • 0 minutes
  • Performing a t Test • 4 minutes
  • Demo: Performing a One-Sample t Test Using PROC TTEST • 3 minutes
  • Scenario • 1 minute
  • Assumptions for the Two-Sample t Test • 2 minutes
  • Testing for Equal and Unequal Variances • 2 minutes
  • Demo: Performing a Two-Sample t Test Using PROC TTEST • 4 minutes

2 readings • Total 20 minutes

  • Parameters and Statistics • 10 minutes
  • Normal Distribution • 10 minutes

9 quizzes • Total 100 minutes

  • Question 1.01 • 5 minutes
  • Question 1.02 • 5 minutes
  • Question 1.03 • 5 minutes
  • Question 1.04 • 5 minutes
  • Question 1.05 • 5 minutes
  • Practice - Using PROC TTEST to Perform a One-Sample t Test • 20 minutes
  • Question 1.06 • 5 minutes
  • Practice - Using PROC TTEST to Compare Groups • 20 minutes
  • Introduction and Review of Concepts • 30 minutes

ANOVA and Regression

In this module you learn to use graphical tools that can help determine which predictors are likely or unlikely to be useful. Then you learn to augment these graphical explorations with correlation analyses that describe linear relationships between potential predictors and our response variable. After you determine potential predictors, tools like ANOVA and regression help you assess the quality of the relationship between the response and predictors.

29 videos 2 readings 14 quizzes

29 videos • Total 69 minutes

  • Identifying Associations in ANOVA with Box Plots • 1 minute
  • Demo: Exploring Associations Using PROC SGPLOT • 1 minute
  • Identifying Associations in Linear Regression with Scatter Plots • 1 minute
  • Demo: Exploring Associations Using PROC SGSCATTER • 2 minutes
  • The ANOVA Hypothesis • 1 minute
  • Partitioning Variability in ANOVA • 2 minutes
  • Coefficient of Determination • 1 minute
  • F Statistic and Critical Values • 1 minute
  • The ANOVA Model • 2 minutes
  • Demo: Performing a One-Way ANOVA Using PROC GLM • 6 minutes
  • Multiple Comparison Methods • 2 minutes
  • Tukey's and Dunnett's Multiple Comparison Methods • 1 minute
  • Diffograms and Control Plots • 1 minute
  • Demo: Performing a Post Hoc Pairwise Comparison Using PROC GLM • 6 minutes
  • Using Correlation to Measure Relationships between Continuous Variables • 1 minute
  • Hypothesis Testing for a Correlation • 1 minute
  • Avoiding Common Errors When Interpreting Correlations • 5 minutes
  • Demo: Producing Correlation Statistics and Scatter Plots Using PROC CORR • 6 minutes
  • The Simple Linear Regression Model • 1 minute
  • How SAS Performs Simple Linear Regression • 1 minute
  • Comparing the Regression Model to a Baseline Model • 2 minutes
  • Hypothesis Testing and Assumptions for Linear Regression • 1 minute
  • Demo: Performing Simple Linear Regression Using PROC REG • 7 minutes
  • What Does a CLASS Statement Do? • 10 minutes
  • Correlation Analysis and Model Building • 10 minutes

14 quizzes • Total 155 minutes

  • Question 2.01 • 5 minutes
  • Question 2.02 • 5 minutes
  • Question 2.03 • 5 minutes
  • Question 2.04 • 5 minutes
  • Practice - Performing a One-Way ANOVA • 20 minutes
  • Question 2.05 • 5 minutes
  • Question 2.06 • 5 minutes
  • Practice - Using PROC GLM to Perform Post Hoc Parwise Comparisons • 20 minutes
  • Question 2.07 • 5 minutes
  • Question 2.08 • 5 minutes
  • Practice - Describing the Relationship between Continuous Variables • 20 minutes
  • Question 2.09 • 5 minutes
  • Practice - Using PROC REG to Fit a Simple Linear Regression Model • 20 minutes
  • ANOVA and Regression • 30 minutes

More Complex Linear Models

In this module you expand the one-way ANOVA model to a two-factor analysis of variance and then extend simple linear regression to multiple regression with two predictors. After you understand the concepts of two-way ANOVA and multiple linear regression with two predictors, you'll have the skills to fit and interpret models with many variables.

13 videos 1 reading 5 quizzes

13 videos • Total 43 minutes

  • Applying the Two-Way ANOVA Model • 3 minutes
  • Demo: Performing a Two-Way ANOVA Using PROC GLM • 7 minutes
  • Interactions • 3 minutes
  • Demo: Performing a Two-Way ANOVA With an Interaction Using PROC GLM • 5 minutes
  • Demo: Performing Post-Processing Analysis Using PROC PLM • 4 minutes
  • The Multiple Linear Regression Model • 2 minutes
  • Hypothesis Testing for Multiple Regression • 1 minute
  • Multiple Linear Regression versus Simple Linear Regression • 2 minutes
  • Adjusted R-Square • 1 minute
  • Demo: Fitting a Multiple Linear Regression Model Using PROC REG • 7 minutes

1 reading • Total 10 minutes

  • The STORE Statement • 10 minutes

5 quizzes • Total 80 minutes

  • Question 3.01 • 5 minutes
  • Practice - Performing a Two-Way ANOVA Using PROC GLM • 20 minutes
  • Question 3.02 • 5 minutes
  • Practice - Performing Multiple Regression Using PROC REG • 20 minutes
  • More Complex Linear Models • 30 minutes

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

introduction to hypothesis testing assignment

Through innovative software and services, SAS empowers and inspires customers around the world to transform data into intelligence. SAS is a trusted analytics powerhouse for organizations seeking immediate value from their data. A deep bench of analytics solutions and broad industry knowledge keep our customers coming back and feeling confident. With SAS®, you can discover insights from your data and make sense of it all. Identify what’s working and fix what isn’t. Make more intelligent decisions. And drive relevant change.

Recommended if you're interested in Data Analysis

introduction to hypothesis testing assignment

University of London

Probability and Statistics: To p or not to p?

introduction to hypothesis testing assignment

Regression Modeling Fundamentals

introduction to hypothesis testing assignment

Predictive Modeling with Logistic Regression using SAS

introduction to hypothesis testing assignment

Imperial College London

Mathematics for Machine Learning: Linear Algebra

Why people choose coursera for their career.

introduction to hypothesis testing assignment

Learner reviews

Showing 3 of 109

109 reviews

Reviewed on Jan 24, 2021

1) SAS programming basics 2) Not uninterest in Statistics ... then only.

Reviewed on Jun 15, 2021

Thank you so much to the instructor, Jordan Bakerman for teaching this course.

Reviewed on Sep 30, 2023

The Course was excellent. The study materials were very clear and understandable.

New to Data Analysis? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions

When will i have access to the lectures and assignments.

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Certificate?

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

What is the refund policy?

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy Opens in a new tab .

More questions

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

7: Introduction to Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 195829
  • 7.1: Logic and Purpose of Hypothesis Testing
  • 7.2: The Probability Value
  • 7.3: The Null Hypothesis
  • 7.4: The Alternative Hypothesis
  • 7.5: Critical values, p-values, and significance level
  • 7.6: Steps of the Hypothesis Testing Process
  • 7.7: Example- Movie Popcorn
  • 7.8: Effect Size
  • 7.9: Example- Office Temperature
  • 7.10: Example- Different Significance Level
  • 7.11: Other Considerations in Hypothesis Testing
  • 7.12: Exercises
  • 7.13: Answers to Odd- Numbered Exercises

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

7: Introduction to Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 7112

  • Foster et al.
  • University of Missouri-St. Louis, Rice University, & University of Houston, Downtown Campus via University of Missouri’s Affordable and Open Access Educational Resources Initiative
  • 7.1: Logic and Purpose of Hypothesis Testing
  • 7.2: The Probability Value
  • 7.3: The Null Hypothesis
  • 7.4: The Alternative Hypothesis
  • 7.5: Critical values, p-values, and significance level
  • 7.6: Steps of the Hypothesis Testing Process
  • 7.7: Movie Popcorn
  • 7.8: Effect Size
  • 7.9: Office Temperature
  • 7.10: Different Significance Level
  • 7.11: Other Considerations in Hypothesis Testing
  • 7.E: Introduction to Hypothesis Testing (Exercises)

Resources: Course Assignments

Module 10 Assignment: Hypothesis Testing for the Population Mean

The purpose of this activity is to give you guided practice in going through the process of a t-test for the population mean, and teach you how to carry out this test using statistical software.

Background:

A group of 75 college students from a certain liberal arts college were randomly sampled and asked about the number of alcoholic drinks they have in a typical week. The file containing the data is linked below. The purpose of this  study  was to compare the drinking habits of the students at the college to the drinking habits of college students in general. In particular, the dean of students, who initiated this study, would like to check whether the mean number of alcoholic drinks that students at his college have in a typical week differs from the mean of U.S. college students in general, which is estimated to be 4.73.

Question 1:

Let Îź be the mean number of alcoholic beverages that students in the college drink in a typical week. State the hypotheses that are being tested in this problem.

Question 2:

Here is a histogram of the data. Can we safely use the t-test with this data?

Instructions

Click on the link corresponding to your statistical package to see instructions for completing the activity, and then answer the questions below.

R  |  StatCrunch  |  Minitab  |  Excel  |  TI Calculator

Question 3:

State the test statistic, interpret its value and show how it was found.

Question 4:

Based on the P-value, draw your conclusions in context.

Question 5:

What would your conclusions be if the dean of students suspected that the mean number of alcoholic drinks that students in the college consume in a typical week is  lower  than the mean of U.S. college students in general? In other words, if this were a test of the hypotheses:

H 0 : Îź = 4.73 drinks per week

H a : Îź < 4.73 drinks per week

Question 6:

Now suppose that instead of the 75 students having been randomly selected from the entire student body, the 75 students had been randomly selected  only  from the engineering classes at the college (for the sake of convenience).

Address the following two issues regarding the effect of such a change in the study design:

a. Would we still be mathematically justified in using the T-test for obtaining conclusions, as we did previously?

b. Would the resulting conclusions still address the question of interest (which, remember, was to investigate the drinking habits of the students at the college as whole)?

Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

IMAGES

  1. Hypothesis Testing- Meaning, Types & Steps

    introduction to hypothesis testing assignment

  2. Introduction to Hypothesis Testing

    introduction to hypothesis testing assignment

  3. PPT

    introduction to hypothesis testing assignment

  4. PPT

    introduction to hypothesis testing assignment

  5. (PDF) Hypothesis testing Assignment 3

    introduction to hypothesis testing assignment

  6. Hypothesis Testing Solved Examples(Questions and Solutions)

    introduction to hypothesis testing assignment

VIDEO

  1. Hypotheses

  2. Week 9

  3. Hypothesis Testing Assignment Answers

  4. Introduction to Hypothesis Testing

  5. Research Question and Hypothesis Assignment

  6. t-TEST INTRODUCTION- HYPOTHESIS TESTING VIDEO-15

COMMENTS

  1. Introduction to Hypothesis Testing assignment Flashcards

    The science teachers inspect the homework assignments from a random sample of 50 students and find that 24 are complete. Fail to reject the null hypothesis. A nationwide poll revealed that 28% of teens said they seldom or never argue with their parents. Ralph wonders if this result would be similar at his large high school, so he surveys a ...

  2. Hypothesis Testing

    Present the findings in your results and discussion section. Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps. Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test.

  3. 8.1.1: Introduction to Hypothesis Testing Part 1

    Review. In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of claim.If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with \(H_{0}\).The null is not rejected unless the hypothesis test shows otherwise.

  4. Chapter 7: Introduction to Hypothesis Testing

    This chapter lays out the basic logic and process of hypothesis testing. We will perform z tests, which use the z score formula from Chapter 6 and data from a sample mean to make an inference about a population.. Logic and Purpose of Hypothesis Testing. A hypothesis is a prediction that is tested in a research study. The statistician R. A. Fisher explained the concept of hypothesis testing ...

  5. Introduction to Hypothesis Testing with Examples

    Likelihood ratio. In the likelihood ratio test, we reject the null hypothesis if the ratio is above a certain value i.e, reject the null hypothesis if L(X) > 𝜉, else accept it. 𝜉 is called the critical ratio.. So this is how we can draw a decision boundary: we separate the observations for which the likelihood ratio is greater than the critical ratio from the observations for which it ...

  6. PDF Lecture 14: Introduction to hypothesis testing (v2) Ramesh Johari

    In hypothesis testing, we quantify our uncertainty by asking whether it is likely that data came from a particular distribution. We will focus on the following common type of hypothesis testing scenario: IThe data Y come from some distribution f(Yj ), with parameter . IThere are two possibilities for : either = . 0, or 6= .

  7. PDF Introduction to Hypothesis Testing

    8.2 FOUR STEPS TO HYPOTHESIS TESTING The goal of hypothesis testing is to determine the likelihood that a population parameter, such as the mean, is likely to be true. In this section, we describe the four steps of hypothesis testing that were briefly introduced in Section 8.1: Step 1: State the hypotheses. Step 2: Set the criteria for a decision.

  8. 9.1: Introduction to Hypothesis Testing

    A statistician will make a decision about these claims. This process is called " hypothesis testing ." A hypothesis test involves collecting data from a sample and evaluating the data. Then, the statistician makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, to reject the null hypothesis.

  9. 9.1: Introduction to Hypothesis Testing

    This page titled 9.1: Introduction to Hypothesis Testing is shared under a CC BY 2.0 license and was authored, remixed, and/or curated by Kyle Siegrist ( Random Services) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request. In hypothesis testing, the goal is ...

  10. Introduction to Hypothesis Testing

    About this course. Get started with hypothesis testing by examining a one-sample t-test and binomial tests — both used for drawing inference about a population based on a smaller sample from that population.

  11. Introduction to Hypothesis Testing

    Given a claim about a population, we will learn to determine the null and alternative hypotheses. We will recognize the logic behind a hypothesis test and how it relates to the P-value as well as recognizing type I and type II errors. These are powerful tools in exploring and understanding data in real-life. Concepts in Statistics.

  12. Introduction to Hypothesis Testing

    A hypothesis test consists of five steps: 1. State the hypotheses. State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false. 2. Determine a significance level to use for the hypothesis. Decide on a significance level.

  13. Introduction to Hypothesis Testing

    Introduction to Hypothesis Testing. III. Unit 3. 10. One-sample t-Test. 11. Independent-samples t-Test. 12. Paired-Samples t-Test. ... The last two goals are related to the idea of hypothesis testing and inferential statistics, while the first is clearly tied to descriptive statistics. ... Random assignment of charts does not ensure that the ...

  14. Hypothesis testing

    Dan L. Nicolae. Hypothesis testing can be thought of as a way to investigate the consistency of a dataset with a model, where a model is a set of rules that describe how data are generated. The consistency is evaluated using ideas from probability and probability distributions. The consistency question in the above diagram is short for "Is it ...

  15. Hypothesis testing

    Hypothesis testing#. Hypothesis testing is about choosing between two views, called hypotheses, on how data were generated (for example, SSR=1 or SSR =1.05 where SSR is the secondary sex ratio we defined in the previous section).Hypotheses, called null and alternative, should be specified before doing the analysis.Testing is a way to select the hypothesis that is better supported by the data.

  16. 11.1: Introduction to Hypothesis Testing

    To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference ( 31.4 − 24.7 = 6.7 31.4 − 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance.

  17. PDF Introduction to Hypothesis Testing

    The coin example was an exampled of a two-tailed hypothesis test, because we would have rejected the Null hypothesis had the coin been been biased towards heads OR tails Alternative Hypothesis Rejection Region for Level Test H 1: > 0 H 1: < 0 H 1: 6= 0 z z ↵ z z ↵ (z z ↵/2) or (z z ↵/2) ↵ → →-7 d & I f dh /

  18. Introduction to Hypothesis Testing

    A hypothesis is a proposed explanation for a phenomenon. In the context of statistical hypothesis tests the term hypothesis is a statement about something that is supposed to be true. A hypothesis test involves two hypotheses: the null hypothesis and the alternative hypothesis. The null hypothesis \ ( (H_0)\) is a statement to be tested.

  19. Introduction to Statistical Analysis: Hypothesis Testing

    Introduction and Review of Concepts. In this module you learn about the models required to analyze different types of data and the difference between explanatory vs predictive modeling. Then you review fundamental statistical concepts, such as the sampling distribution of a mean, hypothesis testing, p-values, and confidence intervals.

  20. 7: Introduction to Hypothesis Testing

    Workbench. PSYCH 330: Introduction to Psychology Statistics. 7: Introduction to Hypothesis Testing. Expand/collapse global location.

  21. 7: Introduction to Hypothesis Testing

    7.E: Introduction to Hypothesis Testing (Exercises) This chapter lays out the basic logic and process of hypothesis testing. We will perform z-tests, which use the z-score formula from chapter 6 and data from a sample mean to make an inference about a ….

  22. Introduction to Hypothesis Testing

    To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference (31.4 - 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance.

  23. Module 10 Assignment: Hypothesis Testing for the Population Mean

    A group of 75 college students from a certain liberal arts college were randomly sampled and asked about the number of alcoholic drinks they have in a typical week. The file containing the data is linked below. The purpose of this study was to compare the drinking habits of the students at the college to the drinking habits of college students ...