marketing research hypothesis testing examples

Marketing Research Design & Analysis 2019

5 hypothesis testing.

This chapter is primarily based on Field, A., Miles J., & Field, Z. (2012): Discovering Statistics Using R. Sage Publications, chapters 5, 9, 15, 18 .

You can download the corresponding R-Code here

5.1 Introduction

We test hypotheses because we are confined to taking samples – we rarely work with the entire population. In the previous chapter, we introduced the standard error (i.e., the standard deviation of a large number of hypothetical samples) as an estimate of how well a particular sample represents the population. We also saw how we can construct confidence intervals around the sample mean $\bar x$ by computing $SE_{\bar x}$ as an estimate of $\sigma_{\bar x}$ using $s$ as an estimate of $\sigma$ and calculating the 95% CI as $\bar x \pm 1.96 * SE_{\bar x}$ . Although we do not know the true population mean ( $\mu$ ), we might have an hypothesis about it and this would tell us how the corresponding sampling distribution looks like. Based on the sampling distribution of the hypothesized population mean, we could then determine the probability of a given sample assuming that the hypothesis is true .

Let us again begin by assuming we know the entire population using the example of music listening times among students from the previous example. As a reminder, the following plot shows the distribution of music listening times in the population of WU students.

marketing research hypothesis testing examples

In this example, the population mean ( $\mu$ ) is equal to 19.98, and the population standard deviation $\sigma$ is equal to 14.15.

5.1.1 The null hypothesis

Let us assume that we were planning to take a random sample of 50 students from this population and our hypothesis was that the mean listening time is equal to some specific value $\mu_0$ , say $10$ . This would be our null hypothesis . The null hypothesis refers to the statement that is being tested and is usually a statement of the status quo, one of no difference or no effect. In our example, the null hypothesis would state that there is no difference between the true population mean $\mu$ and the hypothesized value $\mu_0$ (in our example $10$ ), which can be expressed as follows:

\[ H_0: \mu = \mu_0 \] When conducting research, we are usually interested in providing evidence against the null hypothesis. If we then observe sufficient evidence against it and our estimate is said to be significant. If the null hypothesis is rejected, this is taken as support for the alternative hypothesis . The alternative hypothesis assumes that some difference exists, which can be expressed as follows:

\[ H_1: \mu \neq \mu_0 \] Accepting the alternative hypothesis in turn will often lead to changes in opinions or actions. Note that while the null hypothesis may be rejected, it can never be accepted based on a single test. If we fail to reject the null hypothesis, it means that we simply haven’t collected enough evidence against the null hypothesis to disprove it. In classical hypothesis testing, there is no way to determine whether the null hypothesis is true. Hypothesis testing provides a means to quantify to what extent the data from our sample is in line with the null hypothesis.

In order to quantify the concept of “sufficient evidence” we look at the theoretical distribution of the sample means given our null hypothesis and the sample standard error. Using the available information we can infer the sampling distribution for our null hypothesis. Recall that the standard deviation of the sampling distribution (i.e., the standard error of the mean) is given by $\sigma_{\bar x}={\sigma \over \sqrt{n}}$ , and thus can be computed as follows:

Since we know from the central limit theorem that the sampling distribution is normal for large enough samples, we can now visualize the expected sampling distribution if our null hypothesis was in fact true (i.e., if the was no difference between the true population mean and the hypothesized mean of 10).

We also know that 95% of the probability is within 1.96 standard deviations from the mean. Values higher than that are rather unlikely, if our hypothesis about the population mean was indeed true. This is shown by the shaded area, also known as the “rejection region”. To test our hypothesis that the population mean is equal to $10$ , let us take a random sample from the population.

The mean listening time in the sample (black line) $\bar x$ is 18.59. We can already see from the graphic above that such a value is rather unlikely under the hypothesis that the population mean is $10$ . Intuitively, such a result would therefore provide evidence against our null hypothesis. But how could we quantify specifically how unlikely it is to obtain such a value and decide whether or not to reject the null hypothesis? Significance tests can be used to provide answers to these questions.

5.1.2 Statistical inference on a sample

5.1.2.1 test statistic, 5.1.2.1.1 z-scores.

Let’s go back to the sampling distribution above. We know that 95% of all values will fall within 1.96 standard deviations from the mean. So if we could express the distance between our sample mean and the null hypothesis in terms of standard deviations, we could make statements about the probability of getting a sample mean of the observed magnitude (or more extreme values). Essentially, we would like to know how many standard deviations ( $\sigma_{\bar x}$ ) our sample mean ( $\bar x$ ) is away from the population mean if the null hypothesis was true ( $\mu_0$ ). This can be formally expressed as follows:

\[ \bar x- \mu_0 = z \sigma_{\bar x} \]

In this equation, z will tell us how many standard deviations the sample mean $\bar x$ is away from the null hypothesis $\mu_0$ . Solving for z gives us:

\[ z = {\bar x- \mu_0 \over \sigma_{\bar x}}={\bar x- \mu_0 \over \sigma / \sqrt{n}} \]

This standardized value (or “z-score”) is also referred to as a test statistic . Let’s compute the test statistic for our example above:

To make a decision on whether the difference can be deemed statistically significant, we now need to compare this calculated test statistic to a meaningful threshold. In order to do so, we need to decide on a significance level $\alpha$ , which expresses the probability of finding an effect that does not actually exist (i.e., Type I Error). You can find a detailed discussion of this point at the end of this chapter. For now, we will adopt the widely accepted significance level of 5% and set $\alpha$ to 0.05. The critical value for the normal distribution and $\alpha$ = 0.05 can be computed using the qnorm() function as follows:

We use 0.975 and not 0.95 since we are running a two-sided test and need to account for the rejection region at the other end of the distribution. Recall that for the normal distribution, 95% of the total probability falls within 1.96 standard deviations of the mean, so that higher (absolute) values provide evidence against the null hypothesis. Generally, we speak of a statistically significant effect if the (absolute) calculated test statistic is larger than the (absolute) critical value. We can easily check if this is the case in our example:

Since the absolute value of the calculated test statistic is larger than the critical value, we would reject $H_0$ and conclude that the true population mean $\mu$ is significantly different from the hypothesized value $\mu_0 = 10$ .

5.1.2.1.2 t-statistic

You may have noticed that the formula for the z-score above assumes that we know the true population standard deviation ( $\sigma$ ) when computing the standard deviation of the sampling distribution ( $\sigma_{\bar x}$ ) in the denominator. However, the population standard deviation is usually not known in the real world and therefore represents another unknown population parameter which we have to estimate from the sample. We saw in the previous chapter that we usually use $s$ as an estimate of $\sigma$ and $SE_{\bar x}$ as and estimate of $\sigma_{\bar x}$ . Intuitively, we should be more conservative regarding the critical value that we used above to assess whether we have a significant effect to reflect this uncertainty about the true population standard deviation. That is, the threshold for a “significant” effect should be higher to safeguard against falsely claiming a significant effect when there is none. If we replace $\sigma_{\bar x}$ by it’s estimate $SE_{\bar x}$ in the formula for the z-score, we get a new test statistic (i.e, the t-statistic ) with its own distribution (the t-distribution ):

\[ t = {\bar x- \mu_0 \over SE_{\bar x}}={\bar x- \mu_0 \over s / \sqrt{n}} \]

Here, $\bar X$ denotes the sample mean and $s$ the sample standard deviation. The t-distribution has more probability in its “tails”, i.e. farther away from the mean. This reflects the higher uncertainty introduced by replacing the population standard deviation by its sample estimate. Intuitively, this is particularly relevant for small samples, since the uncertainty about the true population parameters decreases with increasing sample size. This is reflected by the fact that the exact shape of the t-distribution depends on the degrees of freedom , which is the sample size minus one (i.e., $n-1$ ). To see this, the following graph shows the t-distribution with different degrees of freedom for a two-tailed test and $\alpha = 0.05$ . The grey curve shows the normal distribution.

Notice that as $n$ gets larger, the t-distribution gets closer and closer to the normal distribution, reflecting the fact that the uncertainty introduced by $s$ is reduced. To summarize, we now have an estimate for the standard deviation of the distribution of the sample mean (i.e., $SE_{\bar x}$ ) and an appropriate distribution that takes into account the necessary uncertainty (i.e., the t-distribution). Let us now compute the t-statistic according to the formula above:

Notice that the value of the t-statistic is higher compared to the z-score (4.29). This can be attributed to the fact that by using the $s$ as and estimate of $\sigma$ , we underestimate the true population standard deviation. Hence, the critical value would need to be larger to adjust for this. This is what the t-distribution does. Let us compute the critical value from the t-distribution with n - 1 degrees of freedom.

Again, we use 0.975 and not 0.95 since we are running a two-sided test and need to account for the rejection region at the other end of the distribution. Notice that the new critical value based on the t-distributionis larger, to reflect the uncertainty when estimating $\sigma$ from $s$ . Now we can see that the calculated test statistic is still larger than the critical value.

The following graphics shows that the calculated test statistic (red line) falls into the rejection region so that in our example, we would reject the null hypothesis that the true population mean is equal to $10$ .

Decision: Reject $H_0$ , given that the calculated test statistic is larger than critical value.

Something to keep in mind here is the fact the test statistic is a function of the sample size. This, as $n$ gets large, the test statistic gets larger as well and we are more likely to find a significant effect. This reflects the decrease in uncertainty about the true population mean as our sample size increases.

5.1.2.2 P-values

In the previous section, we computed the test statistic, which tells us how close our sample is to the null hypothesis. The p-value corresponds to the probability that the test statistic would take a value as extreme or more extreme than the one that we actually observed, assuming that the null hypothesis is true . It is important to note that this is a conditional probability : we compute the probability of observing a sample mean (or a more extreme value) conditional on the assumption that the null hypothesis is true. The pnorm() function can be used to compute this probability. It is the cumulative probability distribution function of the `normal distribution. Cumulative probability means that the function returns the probability that the test statistic will take a value less than or equal to the calculated test statistic given the degrees of freedom. However, we are interested in obtaining the probability of observing a test statistic larger than or equal to the calculated test statistic under the null hypothesis (i.e., the p-value). Thus, we need to subtract the cumulative probability from 1. In addition, since we are running a two-sided test, we need to multiply the probability by 2 to account for the rejection region at the other side of the distribution.

This value corresponds to the probability of observing a mean equal to or larger than the one we obtained from our sample, if the null hypothesis was true. As you can see, this probability is very low. A small p-value signals that it is unlikely to observe the calculated test statistic under the null hypothesis. To decide whether or not to reject the null hypothesis, we would now compare this value to the level of significance ( $\alpha$ ) that we chose for our test. For this example, we adopt the widely accepted significance level of 5%, so any test results with a p-value < 0.05 would be deemed statistically significant. Note that the p-value is directly related to the value of the test statistic. The relationship is such that the higher (lower) the value of the test statistic, the lower (higher) the p-value.

Decision: Reject $H_0$ , given that the p-value is smaller than 0.05.

5.1.2.3 Confidence interval

For a given statistic calculated for a sample of observations (e.g., listening times), a 95% confidence interval can be constructed such that in 95% of samples, the true value of the true population mean will fall within its limits. If the parameter value specified in the null hypothesis (here $10$ ) does not lie within the bounds, we reject $H_0$ . Building on what we learned about confidence intervals in the previous chapter, the 95% confidence interval based on the t-distribution can be computed as follows:

\[ CI_{lower} = {\bar x} - t_{1-{\alpha \over 2}} * SE_{\bar x} \\ CI_{upper} = {\bar x} + t_{1-{\alpha \over 2}} * SE_{\bar x} \]

It is easy to compute this interval manually:

The interpretation of this interval is as follows: if we would (hypothetically) take 100 samples and calculated the mean and confidence interval for each of them, then the true population mean would be included in 95% of these intervals. The CI is informative when reporting the result of your test, since it provides an estimate of the uncertainty associated with the test result. From the test statistic or the p-value alone, it is not easy to judge in which range the true population parameter is located. The CI provides an estimate of this range.

Decision: Reject $H_0$ , given that the parameter value from the null hypothesis ( $10$ ) is not included in the interval.

To summarize, you can see that we arrive at the same conclusion (i.e., reject $H_0$ ), irrespective if we use the test statistic, the p-value, or the confidence interval. However, keep in mind that rejecting the null hypothesis does not prove the alternative hypothesis (we can merely provide support for it). Rather, think of the p-value as the chance of obtaining the data we’ve collected assuming that the null hypothesis is true. You should report the confidence interval to provide an estimate of the uncertainty associated with your test results.

5.1.3 Choosing the right test

The test statistic, as we have seen, measures how close the sample is to the null hypothesis and often follows a well-known distribution (e.g., normal, t, or chi-square). To select the correct test, various factors need to be taken into consideration. Some examples are:

On what scale are your variables measured (categorical vs. continuous)?
Do you want to test for relationships or differences?
If you test for differences, how many groups would you like to test?
For parametric tests, are the assumptions fulfilled?

The previous discussion used a one sample t-test as an example, which requires that variable is measured on an interval or ratio scale. If you are confronted with other settings, the following flow chart provides a rough guideline on selecting the correct test:

Flowchart for selecting an appropriate test (source: McElreath, R. (2016): Statistical Rethinking, p. 2)

For a detailed overview over the different type of tests, please also refer to this overview by the UCLA.

5.1.3.1 Parametric vs. non-parametric tests

A basic distinction can be made between parametric and non-parametric tests. Parametric tests require that variables are measured on an interval or ratio scale and that the sampling distribution follows a known distribution. Non-Parametric tests on the other hand do not require the sampling distribution to be normally distributed (a.k.a. “assumption free tests”). These tests may be used when the variable of interest is measured on an ordinal scale or when the parametric assumptions do not hold. They often rely on ranking the data instead of analyzing the actual scores. By ranking the data, information on the magnitude of differences is lost. Thus, parametric tests are more powerful if the sampling distribution is normally distributed. In this chapter, we will first focus on parametric tests and cover non-parametric tests later.

5.1.3.2 One-tailed vs. two-tailed test

For some tests you may choose between a one-tailed test versus a two-tailed test . The choice depends on the hypothesis you specified, i.e., whether you specified a directional or a non-directional hypotheses. In the example above, we used a non-directional hypothesis . That is, we stated that the mean is different from the comparison value $\mu_0$ , but we did not state the direction of the effect. A directional hypothesis states the direction of the effect. For example, we might test whether the population mean is smaller than a comparison value:

\[ H_0: \mu \ge \mu_0 \\ H_1: \mu < \mu_0 \]

Similarly, we could test whether the population mean is larger than a comparison value:

\[ H_0: \mu \le \mu_0 \\ H_1: \mu > \mu_0 \]

Connected to the decision of how to phrase the hypotheses (directional vs. non-directional) is the choice of a one-tailed test versus a two-tailed test . Let’s first think about the meaning of a one-tailed test. Using a significance level of 0.05, a one-tailed test means that 5% of the total area under the probability distribution of our test statistic is located in one tail. Thus, under a one-tailed test, we test for the possibility of the relationship in one direction only, disregarding the possibility of a relationship in the other direction. In our example, a one-tailed test could test either if the mean listening time is significantly larger or smaller compared to the control condition, but not both. Depending on the direction, the mean listening time is significantly larger (smaller) if the test statistic is located in the top (bottom) 5% of its probability distribution.

The following graph shows the critical values that our test statistic would need to surpass so that the difference between the population mean and the comparison value would be deemed statistically significant.

It can be seen that under a one-sided test, the rejection region is at one end of the distribution or the other. In a two-sided test, the rejection region is split between the two tails. As a consequence, the critical value of the test statistic is smaller using a one-tailed test, meaning that it has more power to detect an effect. Having said that, in most applications, we would like to be able catch effects in both directions, simply because we can often not rule out that an effect might exist that is not in the hypothesized direction. For example, if we would conduct a one-tailed test for a mean larger than some specified value but the mean turns out to be substantially smaller, then testing a one-directional hypothesis ($H_0: _0 $) would not allow us to conclude that there is a significant effect because there is not rejection at this end of the distribution.

5.1.4 Summary

As we have seen, the process of hypothesis testing consists of various steps:

Formulate null and alternative hypotheses
Select an appropriate test
Choose the level of significance ( $\alpha$ )
Descriptive statistics and data visualization
Conduct significance test
Report results and draw a marketing conclusion

In the following, we will go through the individual steps using examples for different tests.

5.2 One sample t-test

The example we used in the introduction was an example of the one sample t-test and we computed all statistics by hand to explain the underlying intuition. When you conduct hypothesis tests using R, you do not need to calculate these statistics by hand, since there are build-in routines to conduct the steps for you. Let us use the same example again to see how you would conduct hypothesis tests in R.

1. Formulate null and alternative hypotheses

The null hypothesis states that there is no difference between the true population mean $\mu$ and the hypothesized value (i.e., $10$ ), while the alternative hypothesis states the opposite:

\[ H_0: \mu = 10 \\ H_1: \mu \neq 10 \]

2. Select an appropriate test

Because we would like to test if the mean of a variable is different from a specified threshold, the one-sample t-test is appropriate. The assumptions of the test are 1) that the variable is measured using an interval or ratio scale, and 2) that the sampling distribution is normal. Both assumptions are met since 1) listening time is a ratio scale, and 2) we deem the sample size (n = 50) large enough to assume a normal sampling distribution according to the central limit theorem.

3. Choose the level of significance

We choose the conventional 5% significance level.

4. Descriptive statistics and data visualization

Provide descriptive statistics using the stat.desc() function:

From this, we can already see that the mean is different from the hypothesized value. The question however remains, whether this difference is significantly different, given the sample size and the variability in the data. Since we only have one continuous variable, we can visualize the distribution in a histogram.

5. Conduct significance test

In the beginning of the chapter, we saw, how you could conduct significance test by hand. However, R has built-in routines that you can use to conduct the analyses. The t.test() function can be used to conduct the test. To test if the listening time among WU students was 10, you can use the following code:

Note that if you would have stated a directional hypothesis (i.e., the mean is either greater or smaller than 10 hours), you could easily amend the code to conduct a one sided test by changing the argument alternative from 'two.sided' to either 'less' or 'greater' .

6. Report results and draw a marketing conclusion

Note that the results are the same as above, when we computed the test by hand. You could summarize the results as follows:

On average, the listening times in our sample were different form 10 hours per month (Mean = 18.99 hours, SE = 1.78). This difference was significant t(49) = 5.058, p < .05 (95% CI = [15.42; 22.56]). Based on this evidence, we can conclude that the mean in our sample is significantly lower compared to the hypothesized population mean of $10$ hours, providing evidence against the null hypothesis.

Note that in the reporting above, the number 49 in parenthesis refers to the degrees of freedom that are available from the output.

5.3 Comparing two means

In the one-sample test above, we tested the hypothesis that the population mean has some specific value $\mu_0$ using data from only one sample. In marketing (as in many other disciplines), you will often be confronted with a situation where you wish to compare the means of two groups. For example, you may conduct an experiment and randomly split your sample into two groups, one of which receives a treatment (experimental group) while the other doesn’t (control group). In this case, the units (e.g., participants, products) in each group are different (‘between-subjects design’) and the samples are said to be independent. Hence, we would use a independent-means t-test . If you run an experiment with two experimental conditions and the same units (e.g., participants, products) were observed in both experimental conditions, the sample is said to be dependent in the sense that you have the same units in each group (‘within-subjects design’). In this case, we would need to conduct an dependent-means t-test . Both tests are described in the following sections, beginning with the independent-means t-test.

5.3.1 Independent-means t-test

Using an independent-means t-test, we can compare the means of two possibly different populations. It is, for example, quite common for online companies to test new service features by running an experiment and randomly splitting their website visitors into two groups: one is exposed to the website with the new feature (experimental group) and the other group is not exposed to the new feature (control group). This is a typical A/B-Test scenario.

As an example, imagine that a music streaming service would like to introduce a new playlist feature that let’s their users access playlists created by other users. The goal is to analyse how the new service feature impacts the listening time of users. The service randomly splits a representative subset of their users into two groups and collects data about their listening times over one month. Let’s create a data set to simulate such a scenario.

This data set contains two variables: the variable hours indicates the music listening times (in hours) and the variable group indicates from which group the observation comes, where ‘A’ refers to the control group (with the standard service) and ‘B’ refers to the experimental group (with the new playlist feature). Let’s first look at the descriptive statistics by group using the describeBy function:

From this, we can already see that there is a difference in means between groups A and B. We can also see that the number of observations is different, as is the standard deviation. The question that we would like to answer is whether there is a significant difference in mean listening times between the groups. Remember that different users are contained in each group (‘between-subjects design’) and that the observations in one group are independent of the observations in the other group. Before we will see how you can easily conduct an independent-means t-test, let’s go over some theory first.

5.3.1.1 Theory

As a starting point, let us label the unknown population mean of group A (control group) in our experiment $\mu_1$ , and that of group B (experimental group) $\mu_2$ . In this setting, the null hypothesis would state that the mean in group A is equal to the mean in group B:

\[ H_0: \mu_1=\mu_2 \]

This is equivalent to stating that the difference between the two groups ( $\delta$ ) is zero:

\[ H_0: \mu_1 - \mu_2=0=\delta \]

That is, $\delta$ is the new unknown population parameter, so that the null and alternative hypothesis become:

\[ H_0: \delta = 0 \\ H_1: \delta \ne 0 \]

Remember that we usually don’t have access to the entire population so that we can not observe $\delta$ and have to estimate is from a sample statistic, which we define as $d = \bar x_1-\bar x_2$ , i.e., the difference between the sample means from group a ( $\bar x_1$ ) and group b ( $\bar x_2$ ). But can we really estimate $d$ from $\delta$ ? Remember from the previous chapter, that we could estimate $\mu$ from $\bar x$ , because if we (hypothetically) take a larger number of samples, the distribution of the means of these samples (the sampling distribution) will be normally distributed and its mean will be (in the limit) equal to the population mean. It turns out that we can use the same underlying logic here. The above samples were drawn from two different populations with $\mu_1$ and $\mu_2$ . Let us compute the difference in means between these two populations:

This means that the true difference between the mean listening times of groups a and b is -7.42. Let us now repeat the exercise from the previous chapter: let us repeatedly draw a large number of $20,000$ random samples of 100 users from each of these populations, compute the difference (i.e., $d$ , our estimate of $\delta$ ), store the difference for each draw and create a histogram of $d$ .

This gives us the sampling distribution of the mean differences between the samples. You will notice that this distribution follows a normal distribution and is centered around the true difference between the populations. This means that, on average, the difference between two sample means $d$ is a good estimate of $\delta$ . In our example, the difference between $\bar x_1$ and $\bar x_2$ is:

Now that we have $d$ as an estimate of $\delta$ , how can we find out if the observed difference is significantly different from the null hypothesis (i.e., $\delta = 0$ )?

Recall from the previous section, that the standard deviation of the sampling distribution $\sigma_{\bar x}$ (i.e., the standard error) gives us indication about the precision of our estimate. Further recall that the standard error can be calculated as $\sigma_{\bar x}={\sigma \over \sqrt{n}}$ . So how can we calculate the standard error of the difference between two population means? According to the variance sum law , to find the variance of the sampling distribution of differences, we merely need to add together the variances of the sampling distributions of the two populations that we are comparing. To find the standard error, we only need to take the square root of the variance (because the standard error is the standard deviation of the sampling distribution and the standard deviation is the square root of the variance), so that we get:

\[ \sigma_{\bar x_1-\bar x_2} = \sqrt{{\sigma_1^2 \over n_1}+{\sigma_2^2 \over n_2}} \]

But recall that we don’t actually know the true population standard deviation, so we use $SE_{\bar x_1-\bar x_2}$ as an estimate of $\sigma_{\bar x_1-\bar x_2}$ :

\[ SE_{\bar x_1-\bar x_2} = \sqrt{{s_1^2 \over n_1}+{s_2^2 \over n_2}} \]

Hence, for our example, we can calculate the standard error as follows:

Recall from above that we can calculate the t-statistic as:

\[ t= {\bar x - \mu_0 \over {s \over \sqrt{n}}} \]

Exchanging $\bar x$ for $d$ , we get

\[ t= {(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2) \over {\sqrt{{s_1^2 \over n_1}+{s_2^2 \over n_2}}}} \]

Note that according to our hypothesis $\mu_1-\mu_2=0$ , so that we can calculate the t-statistic as:

Following the example of our one sample t-test above, we would now need to compare this calculated test statistic to a critical value in order to assess if $d$ is sufficiently far away from the null hypothesis to be statistically significant. To do this, we would need to know the exact t-distribution, which depends on the degrees of freedom. The problem is that deriving the degrees of freedom in this case is not that obvious. If we were willing to assume that $\sigma_1=\sigma_2$ , the correct t-distribution has $n_1 -1 + n_2-1$ degrees of freedom (i.e., the sum of the degrees of freedom of the two samples). However, because in real life we don not know if $\sigma_1=\sigma_2$ , we need to account for this additional uncertainty. We will not go into detail here, but R automatically uses a sophisticated approach to correct the degrees of freedom called the Welch’s correction, as we will see in the subsequent application.

5.3.1.2 Application

The section above explained the theory behind the independent-means t-test and showed how to compute the statistics manually. Obviously you don’t have to compute these statistics by hand in this section shows you how to conduct an independent-means t-test in R using the example from above.

We wish to analyze whether there is a significant difference in music listening times between groups A and B. So our null hypothesis is that the means from the two populations are the same (i.e., there is no difference), while the alternative hypothesis states the opposite:

\[ H_0: \mu_1=\mu_2\\ H_1: \mu_1 \ne \mu_2 \]

Since we have a ratio scaled variable (i.e., listening times) and two independent groups, where the mean of one sample is independent of the group of the second sample (i.e., the groups contain different units), the independent-means t-test is appropriate.

We can compute the descriptive statistics for each group separately, using the describeBy() function:

This already shows us that the mean between groups A and B are different. We can visualize the data using a plot of means, boxplot, and a histogram.

To conduct the independent means t-test, we can use the t.test() function:

The results showed that listening times were higher in the experimental group B (Mean = 28.50, SE = 1.7) compared to the control group (Mean = 18.11, SE = 1.22). This means that the listening times were 10.39 hours higher on average in the experimental group (B), compared to the control group (A). An independent-means t-test showed that this difference is significant t(195.73) = -4.9646, p < .05 (95% CI = [-14.514246,-6.261264]).

5.3.2 Dependent-means t-test

While the independent-means t-test is used when different units (e.g., participants, products) were assigned to the different condition, the dependent-means t-test is used when there are two experimental conditions and the same units (e.g., participants, products) were observed in both experimental conditions.

Imagine, for example, a slightly different experimental setup for the above experiment. Imagine that we do not assign different users to the groups, but that a sample of 100 users gets to use the music streaming service with the new feature for one month and we compare the music listening times of these users during the month of the experiment with the listening time in the previous month. Let us generate data for this example:

Note that the data set has almost the same structure as before only that we know have two variables representing the listening times of each user in the month before the experiment and during the month of the experiment when the new feature was tested.

5.3.2.1 Theory

In this case, we want to test the hypothesis that there is no difference in mean the mean listening times between the two months. This can be expressed as follows:

\[ H_0: \mu_D = 0 \\ \] Note that the hypothesis only refers to one population, since both observations come from the same units (i.e., users). To use consistent notation, we replace $\mu_D$ with $\delta$ and get:

\[ H_0: \delta = 0 \\ H_1: \delta \neq 0 \]

where $\delta$ denotes the difference between the observed listening times from the two consecutive months of the same users . As is the previous example, since we do not observe the entire population, we estimate $\delta$ based on the sample using $d$ , which is the difference in mean listening time between the two months for our sample. Note that we assume that everything else (e.g., number of new releases) remained constant over the two month to keep it simple. We can show as above that the sampling distribution follows a normal distribution with a mean that is (in the limit) the same as the population mean. This means, again, that the difference in sample means is a good estimate for the difference in population means. Let’s compute a new variable $d$ , which is the difference between two month.

Note that we now have a new variable, which is the difference in listening times (in hours) between the two months. The mean of this difference is:

Again, we use $SE_{\bar x}$ as an estimate of $\sigma_{\bar x}$ :

\[ SE_{\bar d}={s \over \sqrt{n}} \] Hence, we can compute the standard error as:

The test statistic is therefore:

\[ t = {\bar d- \mu_0 \over SE_{\bar d}} \] on 99 (i.e., n-1) degrees of freedom. Now we can compute the t-statistic as follows:

Note that in the case of the dependent-means t-test, we only base our hypothesis on one population and hence there is only one population variance. This is because in the dependent sample test, the observations come from the same observational units (i.e., users). Hence, there is no unsystematic variation due to potential differences between users that were assigned to the experimental groups. This means that the influence of unobserved factors (unsystematic variation) relative to the variation due to the experimental manipulation (systematic variation) is not as strong in the dependent-means test compared to the independent-means test and we don’t need to correct for differences in the population variances.

5.3.2.2 Application

Again, we don’t have to compute all this by hand since the t.test(...) function can be used to do it for us. Now we have to use the argument paired=TRUE to let R know that we are working with dependent observations.

We would like to the test if there is a difference in music listening times between the two consecutive months, so our null hypothesis is that there is no difference, while the alternative hypothesis states the opposite:

\[ H_0: \mu_D = 0 \\ H_0: \mu_D \ne 0 \]

Since we have a ratio scaled variable (i.e., listening times) and two observations of the same group of users (i.e., the groups contain the same units), the dependent-means t-test is appropriate.

We can compute the descriptive statistics for each month separately, using the describe() function:

This already shows us that the mean between the two months are different. We can visiualize the data using a plot of means, boxplot, and a histogram.

To plot the data, we need to do some restructuring first, since the variables are now stored in two different columns (“hours_a” and “hours_b”). This is also known as the “wide” format. To plot the data we need all observations to be stored in one variable. This is also known as the “long” format. We can use the melt(...) function from the reshape2 package to “melt” the two variable into one column to plot the data.

Now we are ready to plot the data:

To conduct the independent means t-test, we can use the t.test() function with the argument paired = TRUE :

On average, the same users used the service more when it included the new feature (M = 25.96, SE = 1.68) compared to the service without the feature (M = 20.99, SE = 1.34). This difference was significant t(99) = 2.3781, p < .05 (95% CI = [0.82, 9.12]).

5.3.3 Further considerations

5.3.3.1 type i and type ii errors.

When choosing the level of significance ( $\alpha$ ). It is important to note that the choice of the significance level affects the type 1 and type 2 error:

Type I error: When we believe there is a genuine effect in our population, when in fact there isn’t. Probability of type I error ( $\alpha$ ) = level of significance.
Type II error: When we believe that there is no effect in the population, when in fact there is.

This following table shows the possible outcomes of a test (retain vs. reject $H_0$ ), depending on whether $H_0$ is true or false in reality.

5.3.3.2 Significance level, sample size, power, and effect size

When you plan to conduct an experiment, there are some factors that are under direct control of the researcher:

Significance level ( $\alpha$ ) : The probability of finding an effect that does not genuinely exist.
Sample size (n) : The number of observations in each group of the experimental design.

Unlike α and n, which are specified by the researcher, the magnitude of β depends on the actual value of the population parameter. In addition, β is influenced by the effect size (e.g., Cohen’s d), which can be used to determine a standardized measure of the magnitude of an observed effect. The following parameters are affected more indirectly:

Power (1-β) : The probability of finding an effect that does genuinely exists.
Effect size (d) : Standardized measure of the effect size under the alternate hypothesis.

Although β is unknown, it is related to α. For example, if we would like to be absolutely sure that we do not falsely identify an effect which does not exist (i.e., make a type I error), this means that the probability of identifying an effect that does exist (i.e., 1-β) decreases and vice versa. Thus, an extremely low value of α (e.g., α = 0.0001) will result in intolerably high β errors. A common approach is to set α=0.05 and 1-β=0.80.

Unlike the t-value of our test, the effect size (d) is unaffected by the sample size and can be categorized as follows (see Cohen, J. 1988):

0.2 (small effect)
0.5 (medium effect)
0.8 (large effect)

In order to test more subtle effects (smaller effect sizes), you need a larger sample size compared to the test of more obvious effects. In this paper , you can find a list of examples for different effect sizes and the number of observations you need to reliably find an effect of that magnitude. Although the exact effect size is unknown before the experiment, you might be able to make a guess about the effect size (e.g., based on previous studies).

If you wish to obtain a standardized measure of the effect, you may compute the effect size (Cohen’s d) using the cohensD() function from the lsr package. Using the examples from the independent-means t-test above, we would use:

According to the thresholds defined above, this effect would be judged to be a small-medium effect.

For the dependent-means t-test, we would use:

According to the thresholds defined above, this effect would also be judged to be a small-medium effect.

When constructing an experimental design, your goal should be to maximize the power of the test while maintaining an acceptable significance level and keeping the sample as small as possible. To achieve this goal, you may use the pwr package, which let’s you compute n , d , alpha , and power . You only need to specify three of the four input variables to get the fourth.

For example, what sample size do we need (per group) to identify an effect with d = 0.6, α = 0.05, and power = 0.8:

Or we could ask, what is the power of our test with 51 observations in each group, d = 0.6, and α = 0.05:

5.3.3.3 P-values, stopping rules and p-hacking

From my experience, students tend to place a lot of weight on p-values when interpreting their research findings. It is therefore important to note some points that hopefully help to put the meaning of a “significant” vs. “insignificant” test result into perspective.

Significant result

Even if the probability of the effect being a chance result is small (e.g., less than .05) it doesn’t necessarily mean that the effect is important.
Very small and unimportant effects can turn out to be statistically significant if the sample size is large enough.

Insignificant result

If the probability of the effect occurring by chance is large (greater than .05), the alternative hypothesis is rejected. However, this does not mean that the null hypothesis is true.
Although an effect might not be large enough to be anything other than a chance finding, it doesn’t mean that the effect is zero.
In fact, two random samples will always have slightly different means that would deemed to be statistically significant if the samples were large enough.

Thus, you should not base your research conclusion on p-values alone!

It is also crucial to determine the sample size before you run the experiment or before you start your analysis. Why? Consider the following example:

You run an experiment
After each respondent you analyze the data and look at the mean difference between the two groups with a t-test
You stop when you have a significant effect

This is called p-hacking and should be avoided at all costs. Assuming that both groups come from the same population (i.e., there is no difference in the means): What is the likelihood that the result will be significant at some point? In other words, what is the likelihood that you will draw the wrong conclusion from your data that there is an effect, while there is none? This is shown in the following graph using simulated data - the color red indicates significant test results that arise although there is no effect (i.e., false positives).

Figure 5.1: p-hacking (red indicates false positives)

5.4 Comparing several means

This chapter is primarily based on Field, A., Miles J., & Field, Z. (2012): Discovering Statistics Using R. Sage Publications, chapters 10 & 12 .

5.4.1 Introduction

In the previous section we learned how to compare means using a t-test. The t-test has some limitations since it only lets you compare 2 means and you can only use it with one independent variable. However, often we would like to compare means from 3 or more groups. In addition, there may be instances in which you manipulate more than one independent variable. For these applications, ANOVA (ANalysis Of VAriance) can be used. Hence, to conduct ANOVA you need:

A metric dependent variable (i.e., measured using an interval or ratio scale)
One or more non-metric (categorical) independent variables (also called factors)

A treatment is a particular combination of factor levels, or categories. One-way ANOVA is used when there is only one categorical variable (factor). In this case, a treatment is the same as a factor level. N-way ANOVA is used with two or more factors. Note that we are only going to talk about a single independent variable in the context of ANOVA. If you have multiple independent variables please refere to the chapter on Regression .

Let’s use an example to see how ANOVA works. Similar to the previous example it is also imaginable that the music streaming service experiments with a recommendation system for user created playlists. We now have three groups, the control group “A” with the current system, treatment group “B” who have access to playlists created by other users but are not shown recommendations and treatment group “C” who are shown recommendations for user created playlists. As always, we load and inspect the data first:

The null hypothesis, typically, is that all means are equal (non-directional hypothesis). Hence, in our case:

\[H_0: \mu_1 = \mu_2 = \mu_3\]

The alternative hypothesis is simply that the means are not all equal, i.e.,

\[H_1: \textrm{Means are not all equal}\]

If you wanted to put this in mathematical notation, you could also write:

\[H_1: \exists {i,j}: {\mu_i \ne \mu_j} \]

To get a first impression if there are any differences in listening times across the experimental groups, we use the describeBy(...) function from the psych package:

In addition, you should visualize the data using appropriate plots:

Figure 5.2: Plot of means

Note that ANOVA is an omnibus test, which means that we test for an overall difference between groups. Hence, the test will only tell you if the group means are different, but it won’t tell you exactly which groups are different from another.

So why don’t we then just conduct a series of t-tests for all combinations of groups (i.e., A vs. B, A vs. C, B vs. C)? The reason is that if we assume each test to be independent, then there is a 5% probability of falsely rejecting the null hypothesis (Type I error) for each test. In our case:

A vs. B (α = 0.05)
A vs. C (α = 0.05)
B vs. C (α = 0.05)

This means that the overall probability of making a Type I error is 1-(0.95 3 ) = 0.143, since the probability of no Type I error is 0.95 for each of the three tests. Consequently, the Type I error probability would be 14.3%, which is above the conventional standard of 5%. This is also known as the family-wise or experiment-wise error.

5.4.2 Decomposing variance

The basic concept underlying ANOVA is the decomposition of the variance in the data. There are three variance components which we need to consider:

We calculate how much variability there is between scores: Total sum of squares (SS T )
We then calculate how much of this variability can be explained by the model we fit to the data (i.e., how much variability is due to the experimental manipulation): Model sum of squares (SS M )
… and how much cannot be explained (i.e., how much variability is due to individual differences in performance): Residual sum of squares (SS R )

The following figure shows the different variance components using a generalized data matrix:

Decomposing variance

The total variation is determined by the variation between the categories (due to our experimental manipulation) and the within-category variation that is due to extraneous factors (e.g., promotion of artists on a social network):

\[SS_T= SS_M+SS_R\]

To get a better feeling how this relates to our data set, we can look at the data in a slightly different way. Specifically, we can use the dcast(...) function from the reshape2 package to convert the data to wide format:

In this example, X 1 from the generalized data matrix above would refer to the factor level “A”, X 2 to the level “B”, and X 3 to the level “C”. Y 11 refers to the first data point in the first row (i.e., “13”), Y 12 to the second data point in the first row (i.e., “21”), etc.. The grand mean ( $\overline{Y}$ ) and the category means ( $\overline{Y}_c$ ) can be easily computed:

To see how each variance component can be derived, let’s look at the data again. The following graph shows the individual observations by experimental group:

Figure 5.3: Sum of Squares

5.4.2.1 Total sum of squares

To compute the total variation in the data, we consider the difference between each observation and the grand mean. The grand mean is the mean over all observations in the data set. The vertical lines in the following plot measure how far each observation is away from the grand mean:

Figure 5.4: Total Sum of Squares

The formal representation of the total sum of squares (SS T ) is:

\[ SS_T= \sum_{i=1}^{N} (Y_i-\bar{Y})^2 \]

This means that we need to subtract the grand mean from each individual data point, square the difference, and sum up over all the squared differences. Thus, in our example, the total sum of squares can be calculated as:

\[ \begin{align} SS_T =&(13−24.67)^2 + (14−24.67)^2 + … + (2−24.67)^2\\ &+(21−24.67)^2 + (18-24.67)^2 + … + (17−24.67)^2\\ &+(30−24.67)^2 + (37−24.67)^2 + … + (28−24.67)^2\\ &=30855.64 \end{align} \]

You could also compute this in R using:

For the subsequent analyses, it is important to understand the concept behind the degrees of freedom . Remember that in order to estimate a population value from a sample, we need to hold something in the population constant. In ANOVA, the df are generally one less than the number of values used to calculate the SS. For example, when we estimate the population mean from a sample, we assume that the sample mean is equal to the population mean. Then, in order to estimate the population mean from the sample, all but one scores are free to vary and the remaining score needs to be the value that keeps the population mean constant. In our example, we used all 300 observations to calculate the sum of square, so the total degrees of freedom (df T ) are:

\[\begin{equation} \begin{split} df_T = N-1=300-1=299 \end{split} \tag{5.1} \end{equation}\]

5.4.2.2 Model sum of squares

Now we know that there are 26646.33 units of total variation in our data. Next, we compute how much of the total variation can be explained by the differences between groups (i.e., our experimental manipulation). To compute the explained variation in the data, we consider the difference between the values predicted by our model for each observation (i.e., the group mean) and the grand mean. The group mean refers to the mean value within the experimental group. The vertical lines in the following plot measure how far the predicted value for each observation (i.e., the group mean) is away from the grand mean:

Figure 5.5: Model Sum of Squares

The formal representation of the model sum of squares (SS M ) is:

\[ SS_M= \sum_{j=1}^{c} n_j(\bar{Y}_j-\bar{Y})^2 \]

where c denotes the number of categories (experimental groups). This means that we need to subtract the grand mean from each group mean, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:

\[ \begin{align} SS_M &= 100*(15.47−24.67)^2 + 100*(24.88−24.67)^2 + 100*(33.66−24.67)^2 \\ &= 21321.21 \end{align} \]

You could also compute this manually in R using:

In this case, we used the three group means to calculate the sum of squares, so the model degrees of freedom (df M ) are:

\[ df_M= c-1=3-1=2 \]

5.4.2.3 Residual sum of squares

Lastly, we calculate the amount of variation that cannot be explained by our model. In ANOVA, this is the sum of squared distances between what the model predicts for each data point (i.e., the group means) and the observed values. In other words, this refers to the amount of variation that is caused by extraneous factors, such as differences between product characteristics of the products in the different experimental groups. The vertical lines in the following plot measure how far each observation is away from the group mean:

Figure 5.6: Residual Sum of Squares

The formal representation of the residual sum of squares (SS R ) is:

\[ SS_R= \sum_{j=1}^{c} \sum_{i=1}^{n} ({Y}_{ij}-\bar{Y}_{j})^2 \]

This means that we need to subtract the group mean from each individual observation, square the difference, and sum up over all the squared differences. Thus, in our example, the model sum of squares can be calculated as:

\[ \begin{align} SS_R =& (13−14.34)^2 + (14−14.34)^2 + … + (2−14.34)^2 \\ +&(21−24.7)^2 + (18−24.7)^2 + … + (17−24.7)^2 \\ +& (30−34.99)^2 + (37−34.99)^2 + … + (28−34.99)^2 \\ =& 9534.43 \end{align} \]

In this case, we used the 10 values for each of the SS for each group, so the residual degrees of freedom (df R ) are:

\[ \begin{align} df_R=& (n_1-1)+(n_2-1)+(n_3-1) \\ =&(100-1)+(100-1)+(100-1)=297 \end{align} \]

5.4.2.4 Effect strength

Once you have computed the different sum of squares, you can investigate the effect strength. $\eta^2$ is a measure of the variation in Y that is explained by X:

\[ \eta^2= \frac{SS_M}{SS_T}=\frac{21321.21}{30855.64}=0.69 \]

To compute this in R:

The statistic can only take values between 0 and 1. It is equal to 0 when all the category means are equal, indicating that X has no effect on Y. In contrast, it has a value of 1 when there is no variability within each category of X but there is some variability between categories.

5.4.2.5 Test of significance

How can we determine whether the effect of X on Y is significant?

First, we calculate the fit of the most basic model (i.e., the grand mean)
Then, we calculate the fit of the “best” model (i.e., the group means)
A good model should fit the data significantly better than the basic model
The F-statistic or F-ratio compares the amount of systematic variance in the data to the amount of unsystematic variance

The F-statistic uses the ratio of mean square related to X (explained variation) and the mean square related to the error (unexplained variation):

$\frac{SS_M}{SS_R}$

However, since these are summed values, their magnitude is influenced by the number of scores that were summed. For example, to calculate SS M we only used the sum of 3 values (the group means), while we used 30 and 27 values to calculate SS T and SS R , respectively. Thus, we calculate the average sum of squares (“mean square”) to compare the average amount of systematic vs. unsystematic variation by dividing the SS values by the degrees of freedom associated with the respective statistic.

Mean square due to X:

\[ MS_M= \frac{SS_M}{df_M}=\frac{SS_M}{c-1}=\frac{21321.21}{(3-1)} \]

Mean square due to error:

\[ MS_R= \frac{SS_R}{df_R}=\frac{SS_R}{N-c}=\frac{9534.43}{(300-3)} \]

Now, we compare the amount of variability explained by the model (experiment), to the error in the model (variation due to extraneous variables). If the model explains more variability than it can’t explain, then the experimental manipulation has had a significant effect on the outcome (DV). The F-radio can be derived as follows:

\[ F= \frac{MS_M}{MS_R}=\frac{\frac{SS_M}{c-1}}{\frac{SS_R}{N-c}}=\frac{\frac{21321.21}{(3-1)}}{\frac{9534.43}{(300-3)}}=332.08 \]

You can easily compute this in R:

This statistic follows the F distribution with (m = c – 1) and (n = N – c) degrees of freedom. This means that, like the $\chi^2$ distribution, the shape of the F-distribution depends on the degrees of freedom. In this case, the shape depends on the degrees of freedom associated with the numerator and denominator used to compute the F-ratio. The following figure shows the shape of the F-distribution for different degrees of freedom:

The F distribution

The outcome of the test is one of the following:

If the null hypothesis of equal category means is not rejected, then the independent variable does not have a significant effect on the dependent variable
If the null hypothesis is rejected, then the effect of the independent variable is significant

For 2 and 297 degrees of freedom, the critical value of F is 3.026 for α=0.05. As usual, you can either look up these values in a table or use the appropriate function in R:

The output tells us that the calculated test statistic exceeds the critical value. We can also show the test result visually:

Visual depiction of the test result

Thus, we conclude that because F CAL = 332.08 > F CR = 3.03, H 0 is rejected!

Interpretation: one or more of the differences between means are statistically significant.

Reporting: There was a significant effect of promotion on sales levels, F(2,297) = 332.08, p < 0.05, $\eta^2$ = 0.69.

Remember: This doesn’t tell us where the differences between groups lie. To find out which group means exactly differ, we need to use post-hoc procedures (see below).

You don’t have to compute these statistics manually! Luckily, there is a function for ANOVA in R, which does the above calculations for you as we will see in the next section.

5.4.3 One-way ANOVA

5.4.3.1 basic anova.

As already indicated, one-way ANOVA is used when there is only one categorical variable (factor). Before conducting ANOVA, you need to check if the assumptions of the test are fulfilled. The assumptions of ANOVA are discussed in the following sections.

Independence of observations

The observations in the groups should be independent. Because we randomly assigned the listeners to the experimental conditions, this assumption can be assumed to be met.

Distributional assumptions

ANOVA is relatively immune to violations to the normality assumption when sample sizes are large due to the Central Limit Theorem. However, if your sample is small (i.e., n < 30 per group) you may nevertheless want to check the normality of your data, e.g., by using the Shapiro-Wilk test or QQ-Plot. In our example, we have 100 observations in each group which is plenty but let’s create another example with only 10 observations in each group. In the latter case we cannot rely on the Central Limit Theorem and we should test the normality of our data. This can be done using the Shapiro-Wilk Test, which has the Null Hypothesis that the data is normally distributed. Hence, an insignificant test results means that the data can be assumed to be approximately normally distributed:

Since the test result is insignificant for all groups, we can conclude that the data approximately follow a normal distribution.

We could also test the distributional assumptions visually using a Q-Q plot (i.e., quantile-quantile plot). This plot can be used to assess if a set of data plausibly came from some theoretical distribution such as the Normal distribution. Since this is just a visual check, it is somewhat subjective. But it may help us to judge if our assumption is plausible, and if not, which data points contribute to the violation. A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight. In other words, Q-Q plots take your sample data, sort it in ascending order, and then plot them versus quantiles calculated from a theoretical distribution. Quantiles are often referred to as “percentiles” and refer to the points in your data below which a certain proportion of your data fall. Recall, for example, the standard Normal distribution with a mean of 0 and a standard deviation of 1. Since the 50th percentile (or 0.5 quantile) is 0, half the data lie below 0. The 95th percentile (or 0.95 quantile), is about 1.64, which means that 95 percent of the data lie below 1.64. The 97.5th quantile is about 1.96, which means that 97.5% of the data lie below 1.96. In the Q-Q plot, the number of quantiles is selected to match the size of your sample data.

To create the Q-Q plot for the normal distribution, you may use the qqnorm() function, which takes the data to be tested as an argument. Using the qqline() function subsequently on the data creates the line on which the data points should fall based on the theoretical quantiles. If the individual data points deviate a lot from this line, it means that the data is not likely to follow a normal distribution.

Figure 5.7: Q-Q plot 1

Figure 5.8: Q-Q plot 2

Figure 5.9: Q-Q plot 3

The Q-Q plots suggest an approximately Normal distribution. If the assumption had been violated, you might consider transforming your data or resort to a non-parametric test.

Homogeneity of variance

Let’s return to our original dataset with 100 observations in each group for the rest of the analysis.

You can test the homogeneity of variances in R using Levene’s test:

The null hypothesis of the test is that the group variances are equal. Thus, if the test result is significant it means that the variances are not equal. If we cannot reject the null hypothesis (i.e., the group variances are not significantly different), we can proceed with the ANOVA as follows:

You can see that the p-value is smaller than 0.05. This means that, if there really was no difference between the population means (i.e., the Null hypothesis was true), the probability of the observed differences (or larger differences) is less than 5%.

To compute η 2 from the output, we can extract the relevant sum of squares as follows

You can see that the results match the results from our manual computation above ( $\eta^2 =$ 0.69).

The aov() function also automatically generates some plots that you can use to judge if the model assumptions are met. We will inspect two of the plots here.

We will use the first plot to inspect if the residual variances are equal across the experimental groups:

Generally, the residual variance (i.e., the range of values on the y-axis) should be the same for different levels of our independent variable. The plot shows, that there are some slight differences. Notably, the range of residuals is higher in group “B” than in group “C”. However, the differences are not that large and since the Levene’s test could not reject the Null of equal variances, we conclude that the variances are similar enough in this case.

The second plot can be used to test the assumption that the residuals are approximately normally distributed. We use a Q-Q plot to test this assumption:

The plot suggests that, the residuals are approximately normally distributed. We could also test this by extracting the residuals from the anova output using the resid() function and using the Shapiro-Wilk test:

Confirming the impression from the Q-Q plot, we cannot reject the Null that the residuals are approximately normally distributed.

Note that if Levene’s test would have been significant (i.e., variances are not equal), we would have needed to either resort to non-parametric tests (see below), or compute the Welch’s F-ratio instead:

You can see that the results are fairly similar, since the variances turned out to be fairly equal across groups.

5.4.3.2 Post-hoc tests

Provided that significant differences were detected by the overall ANOVA you can find out which group means are different using post hoc procedures. Post hoc procedures are designed to conduct pairwise comparisons of all different combinations of the treatment groups by correcting the level of significance for each test such that the overall Type I error rate (α) across all comparisons remains at 0.05.

In other words, we rejected H 0 : μ 1 = μ 2 = μ 3 , and now we would like to test:

\[H_0: \mu_1 = \mu_2\]

\[H_0: \mu_1 = \mu_3\]

\[H_0: \mu_2 = \mu_3\]

There are several post hoc procedures available to choose from. In this tutorial, we will cover Bonferroni and Tukey’s HSD (“honest significant differences”). Both tests control for family-wise error. Bonferroni tends to have more power when the number of comparisons is small, whereas Tukey’ HSDs is better when testing large numbers of means.

5.4.3.2.1 Bonferroni

One of the most popular (and easiest) methods to correct for the family-wise error rate is to conduct the individual t-tests and divide α by the number of comparisons („k“):

\[ p_{CR}= \frac{\alpha}{k} \]

In our example with three groups:

\[p_{CR}= \frac{0.05}{3}=0.017\]

Thus, the “corrected” critical p-value is now 0.017 instead of 0.05 (i.e., the critical t value is higher). You can implement the Bonferroni procedure in R using:

In the output, you will get the corrected p-values for the individual tests. In our example, we can reject H 0 of equal means for all three tests, since p < 0.05 for all combinations of groups.

Note the difference between the results from the post-hoc test compared to individual t-tests. For example, when we test the “B” vs. “C” groups, the result from a t-test would be:

Usually the p-value is lower in the t-test, reflecting the fact that the family-wise error is not corrected (i.e., the test is less conservative). In this case the p-value is extremely small in both cases and thus indistinguishable.

5.4.3.2.2 Tukey’s HSD

Tukey’s HSD also compares all possible pairs of means (two-by-two combinations; i.e., like a t-test, except that it corrects for family-wise error rate).

Test statistic:

\[\begin{equation} \begin{split} HSD= q\sqrt{\frac{MS_R}{n_c}} \end{split} \tag{5.2} \end{equation}\]

q = value from studentized range table (see e.g., here )
MS R = Mean Square Error from ANOVA
n c = number of observations per group
Decision: Reject H 0 if

\[|\bar{Y}_i-\bar{Y}_j | > HSD\]

The value from the studentized range table can be obtained using the qtukey() function.

\[HSD= 3.33\sqrt{\frac{33.99}{100}}=1.94\]

Since all mean differences between groups are larger than 1.906, we can reject the null hypothesis for all individual tests, confirming the results from the Bonferroni test. To compute Tukey’s HSD, we can use the appropriate function from the multcomp package.

We may also plot the result for the mean differences incl. their confidence intervals:

Figure 5.10: Tukey’s HSD

You can see that the CIs do not cross zero, which means that the true difference between group means is unlikely zero.

Reporting of post hoc results:

The post hoc tests based on Bonferroni and Tukey’s HSD revealed that people listened to music significantly more when:

they had access to user created playlists vs. those who did not,
they got recommendations vs. those who did not. This is true for both the control group “A” as well as treatment “B”.

The following video summarizes how to conduct a one-way ANOVA in R

5.5 Non-parametric tests

Non-Parametric tests do not require the sampling distribution to be normally distributed (a.k.a. “assumption free tests”). These tests may be used when the variable of interest is measured on an ordinal scale or when the parametric assumptions do not hold. They often rely on ranking the data instead of analyzing the actual scores. By ranking the data, information on the magnitude of differences is lost. Thus, parametric tests are more powerful if the sampling distribution is normally distributed.

When should you use non-parametric tests?

When your DV is measured on an ordinal scale
When your data is better represented by the median (e.g., there are outliers that you can’t remove)
When the assumptions of parametric tests are not met (e.g., normally distributed sampling distribution)
You have a very small sample size (i.e., the central limit theorem does not apply)

5.5.1 Mann-Whitney U Test (a.k.a. Wilcoxon rank-sum test)

The Mann-Whitney U test is a non-parametric test of differences between groups, similar to the two sample t-test. In contrast to the two sample t-test it only requires ordinally scaled data and relies on weaker assumptions. Thus it is often useful if the assumptions of the t-test are violated, especially if the data is not on a ratio scale. The following assumptions must be fulfilled for the test to be applicable:

The dependent variable is at least ordinally scaled (i.e. a ranking between values can be established)
The independent variable has only two levels
A between-subjects design is used (i.e., the subjects are not matched across conditions)

Intuitively, the test compares the frequency of low and high ranks between groups. Under the null hypothesis, the amount of high and low ranks should be roughly equal in the two groups. This is achieved through comparing the expected sum of ranks to the actual sum of ranks.

As an example, we will be using data obtained from a field experiment with random assignment. In a music download store, new releases were randomly assigned to an experimental group and sold at a reduced price (i.e., 7.95€), or a control group and sold at the standard price (9.95€). A representative sample of 102 new releases were sampled and these albums were randomly assigned to the experimental groups (i.e., 51 albums per group). The sales were tracked over one day.

Let’s load and investigate the data first:

Inspect descriptives (overall and by group).

Create boxplot and plot of means.

Figure 5.11: Boxplot

Let’s assume that one of the parametric assumptions has been violated and we needed to conduct a non-parametric test. Then, the Mann-Whitney U test is implemented in R using the function wilcox.test() . Using the ranking data as an independent variable and the listening time as a dependent variable, the test could be executed as follows:

The p-value is smaller than 0.05, which leads us to reject the null hypothesis, i.e. the test yields evidence that the new service feature leads to higher music listening times.

5.5.2 Wilcoxon signed-rank test

The Wilcoxon signed-rank test is a non-parametric test used to analyze the difference between paired observations, analogously to the paired t-test. It can be used when measurements come from the same observational units but the distributional assumptions of the paired t-test do not hold, because it does not require any assumptions about the distribution of the measurements. Since we subtract two values, however, the test requires that the dependent variable is at least interval scaled, meaning that intervals have the same meaning for different points on our measurement scale.

Under the null hypothesis $H_0$ , the differences of the measurements should follow a symmetric distribution around 0, meaning that, on average, there is no difference between the two matched samples. $H_1$ states that the distributions mean is non-zero.

As an example, let’s consider a slightly different experimental setup for the music download store. Imagine that new releases were either sold at a reduced price (i.e., 7.95€), or at the standard price (9.95€). Every time a customer came to the store, the prices were randomly determined for every new release. This means that the same 51 albums were either sold at the standard price or at the reduced price and this price was determined randomly. The sales were then recorded over one day. Note the difference to the previous case, where we randomly split the sample and assigned 50% of products to each condition. Now, we randomly vary prices for all albums between high and low prices.

Again, let’s assume that one of the prarametric assumptions has been violated and we needed to conduct a non-parametric test. Then the Wilcoxon signed-rank test can be performed with the same command as the Mann-Whitney U test, provided that the argument paired is set to TRUE .

Using the 95% confidence level, the result would suggest a significant effect of price on sales (i.e., p < 0.05).

5.5.3 Kruskal-Wallis test

When the dependent variable is measured at an ordinal scale and we want to compare more than 2 means
When the assumptions of independent ANOVA are not met (e.g., assumptions regarding the sampling distribution in small samples)

The Kruskal–Wallis test is the non-parametric counterpart of the one-way independent ANOVA. It is designed to test for significant differences in population medians when you have more than two samples (otherwise you would use the Mann-Whitney U-test). The theory is very similar to that of the Mann–Whitney U-test since it is also based on ranked data. The Kruskal-Wallis test is carried out using the kruskal.test() function. Using the same data as before, we type:

The test-statistic follows a chi-square distribution and since the test is significant (p < 0.05), we can conclude that there are significant differences in population medians. Provided that the overall effect is significant, you may perform a post hoc test to find out which groups are different. To get a first impression, we can plot the data using a boxplot:

Figure 5.12: Boxplot

To test for differences between groups, we can, for example, apply post hoc tests according to Nemenyi for pairwise multiple comparisons of the ranked data using the appropriate function from the PMCMR package.

The results reveal that there is a significant difference between the “low” and “high” promotion groups. Note that the results are different compared to the results from the parametric test above. This difference occurs because non-parametric tests have more power to detect differences between groups since we lose information by ranking the data. Thus, you should rely on parametric tests if the assumptions are met.

5.6 Categorical data

In some instances, you will be confronted with differences between proportions, rather than differences between means. For example, you may conduct an A/B-Test and wish to compare the conversion rates between two advertising campaigns. In this case, your data is binary (0 = no conversion, 1 = conversion) and the sampling distribution for such data is binomial. While binomial probabilities are difficult to calculate, we can use a Normal approximation to the binomial when n is large (>100) and the true likelihood of a 1 is not too close to 0 or 1.

Let’s use an example: assume a call center where service agents call potential customers to sell a product. We consider two call center agents:

Service agent 1 talks to 300 customers and gets 200 of them to buy (conversion rate=2/3)
Service agent 2 talks to 300 customers and gets 100 of them to buy (conversion rate=1/3)

As always, we load the data first:

Next, we create a table to check the relative frequencies:

We could also plot the data to visualize the frequencies using ggplot:

Figure 5.13: proportion of conversions per agent (stacked bar chart)

… or using the mosaicplot() function:

Figure 5.14: proportion of conversions per agent (mosaic plot)

5.6.1 Confidence intervals for proportions

Recall that we can use confidence intervals to determine the range of values that the true population parameter will take with a certain level of confidence based on the sample. Similar to the confidence interval for means, we can compute a confidence interval for proportions. The (1- $\alpha$ )% confidence interval for proportions is approximately

\[ CI = p\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p*(1-p)}{N}} \]

where $\sqrt{p(1-p)}$ is the equivalent to the standard deviation in the formula for the confidence interval for means. Based on the equation, it is easy to compute the confidence intervals for the conversion rates of the call center agents:

Similar to testing for differences in means, we could also ask: Is agent 1 twice as likely as agent 2 to convert a customer? Or, to state it formally:

\[H_0: \pi_1=\pi_2 \\ H_1: \pi_1\ne \pi_2\]

where $\pi$ denotes the population parameter associated with the proportion in the respective population. One approach to test this is based on confidence intervals to estimate the difference between two populations. We can compute an approximate confidence interval for the difference between the proportion of successes in group 1 and group 2, as:

\[ CI = p_1-p_2\pm z_{1-\frac{\alpha}{2}}*\sqrt{\frac{p_1*(1-p_1)}{n_1}+\frac{p_2*(1-p_2)}{n_2}} \]

If the confidence interval includes zero, then the data does not suggest a difference between the groups. Let’s compute the confidence interval for differences in the proportions by hand first:

Now we can see that the 95% confidence interval estimate of the difference between the proportion of conversions for agent 1 and the proportion of conversions for agent 2 is between 26% and 41%. This interval tells us the range of plausible values for the difference between the two population proportions. According to this interval, zero is not a plausible value for the difference (i.e., interval does not cross zero), so we reject the null hypothesis that the population proportions are the same.

Instead of computing the intervals by hand, we could also use the prop.test() function:

Note that the prop.test() function uses a slightly different (more accurate) way to compute the confidence interval (Wilson’s score method is used). It is particularly a better approximation for smaller N. That’s why the confidence interval in the output slightly deviates from the manual computation above, which uses the Wald interval.

You can also see that the output from the prop.test() includes the results from a χ 2 test for the equality of proportions (which will be discussed below) and the associated p-value. Since the p-value is less than 0.05, we reject the null hypothesis of equal probability. Thus, the reporting would be:

The test showed that the conversion rate for agent 1 was higher by 33%. This difference is significant χ (1) = 70, p < .05 (95% CI = [0.25,0.41]).

5.6.2 Chi-square test

In the previous section, we saw how we can compute the confidence interval for the difference between proportions to decide on whether or not to reject the null hypothesis. Whenever you would like to investigate the relationship between two categorical variables, the $\chi^2$ test may be used to test whether the variables are independent of each other. It achieves this by comparing the expected number of observations in a group to the actual values. Let’s continue with the example from the previous section. Under the null hypothesis, the two variables agent and conversion in our contingency table are independent (i.e., there is no relationship). This means that the frequency in each field will be roughly proportional to the probability of an observation being in that category, calculated under the assumption that they are independent. The difference between that expected quantity and the actual quantity can be used to construct the test statistic. The test statistic is computed as follows:

\[ \chi^2=\sum_{i=1}^{J}\frac{(f_o-f_e)^2}{f_e} \]

where $J$ is the number of cells in the contingency table, $f_o$ are the observed cell frequencies and $f_e$ are the expected cell frequencies. The larger the differences, the larger the test statistic and the smaller the p-value.

The observed cell frequencies can easily be seen from the contingency table:

The expected cell frequencies can be calculated as follows:

\[ f_e=\frac{(n_r*n_c)}{n} \]

where $n_r$ are the total observed frequencies per row, $n_c$ are the total observed frequencies per column, and $n$ is the total number of observations. Thus, the expected cell frequencies under the assumption of independence can be calculated as:

To sum up, these are the expected cell frequencies

… and these are the observed cell frequencies

To obtain the test statistic, we simply plug the values into the formula:

The test statistic is $\chi^2$ distributed. The chi-square distribution is a non-symmetric distribution. Actually, there are many different chi-square distributions, one for each degree of freedom as show in the following figure.

Figure 5.15: The chi-square distribution

You can see that as the degrees of freedom increase, the chi-square curve approaches a normal distribution. To find the critical value, we need to specify the corresponding degrees of freedom, given by:

\[ df=(r-1)*(c-1) \]

where $r$ is the number of rows and $c$ is the number of columns in the contingency table. Recall that degrees of freedom are generally the number of values that can vary freely when calculating a statistic. In a 2 by 2 table as in our case, we have 2 variables (or two samples) with 2 levels and in each one we have 1 that vary freely. Hence, in our example the degrees of freedom can be calculated as:

Now, we can derive the critical value given the degrees of freedom and the level of confidence using the qchisq() function and test if the calculated test statistic is larger than the critical value:

Figure 5.16: Visual depiction of the test result

We could also compute the p-value using the pchisq() function, which tells us the probability of the observed cell frequencies if the null hypothesis was true (i.e., there was no association):

The test statistic can also be calculated in R directly on the contingency table with the function chisq.test() .

Since the p-value is smaller than 0.05 (i.e., the calculated test statistic is larger than the critical value), we reject H 0 that the two variables are independent.

Note that the test statistic is sensitive to the sample size. To see this, let’s assume that we have a sample of 100 observations instead of 1000 observations:

You can see that even though the proportions haven’t changed, the test is insignificant now. The following equation lets you compute a measure of the effect size, which is insensitive to sample size:

\[ \phi=\sqrt{\frac{\chi^2}{n}} \]

The following guidelines are used to determine the magnitude of the effect size (Cohen, 1988):

0.1 (small effect)
0.3 (medium effect)
0.5 (large effect)

In our example, we can compute the effect sizes for the large and small samples as follows:

You can see that the statistic is insensitive to the sample size.

Note that the Φ coefficient is appropriate for two dichotomous variables (resulting from a 2 x 2 table as above). If any your nominal variables has more than two categories, Cramér’s V should be used instead:

\[ V=\sqrt{\frac{\chi^2}{n*df_{min}}} \]

where $df_{min}$ refers to the degrees of freedom associated with the variable that has fewer categories (e.g., if we have two nominal variables with 3 and 4 categories, $df_{min}$ would be 3 - 1 = 2). The degrees of freedom need to be taken into account when judging the magnitude of the effect sizes (see e.g., here ).

Note that the correct = FALSE argument above ensures that the test statistic is computed in the same way as we have done by hand above. By default, chisq.test() applies a correction to prevent overestimation of statistical significance for small data (called the Yates’ correction). The correction is implemented by subtracting the value 0.5 from the computed difference between the observed and expected cell counts in the numerator of the test statistic. This means that the calculated test statistic will be smaller (i.e., more conservative). Although the adjustment may go too far in some instances, you should generally rely on the adjusted results, which can be computed as follows:

As you can see, the results don’t change much in our example, since the differences between the observed and expected cell frequencies are fairly large relative to the correction.

Caution is warranted when the cell counts in the contingency table are small. The usual rule of thumb is that all cell counts should be at least 5 (this may be a little too stringent though). When some cell counts are too small, you can use Fisher’s exact test using the fisher.test() function.

The Fisher test, while more conservative, also shows a significant difference between the proportions (p < 0.05). This is not surprising since the cell counts in our example are fairly large.

5.6.3 Sample size

To calculate the required sample size when comparing proportions, the power.prop.test() function can be used. For example, we could ask how large our sample needs to be if we would like to compare two groups with conversion rates of 2% and 2.5%, respectively using the conventional settings for $\alpha$ and $\beta$ :

The output tells us that we need 13809 observations per group to detect a difference of the desired size.

How to write a hypothesis for marketing experimentation

Creating your strongest marketing hypothesis

The potential for your marketing improvement depends on the strength of your testing hypotheses.

But where are you getting your test ideas from? Have you been scouring competitor sites, or perhaps pulling from previous designs on your site? The web is full of ideas and you’re full of ideas – there is no shortage of inspiration, that’s for sure.

Coming up with something you want to test isn’t hard to do. Coming up with something you should test can be hard to do.

Hard – yes. Impossible? No. Which is good news, because if you can’t create hypotheses for things that should be tested, your test results won’t mean mean much, and you probably shouldn’t be spending your time testing.

Taking the time to write your hypotheses correctly will help you structure your ideas, get better results, and avoid wasting traffic on poor test designs.

With this post, we’re getting advanced with marketing hypotheses, showing you how to write and structure your hypotheses to gain both business results and marketing insights!

By the time you finish reading, you’ll be able to:

Distinguish a solid hypothesis from a time-waster, and
Structure your solid hypothesis to get results and insights

To make this whole experience a bit more tangible, let’s track a sample idea from…well…idea to hypothesis.

Let’s say you identified a call-to-action (CTA)* while browsing the web, and you were inspired to test something similar on your own lead generation landing page. You think it might work for your users! Your idea is:

“My page needs a new CTA.”

*A call-to-action is the point where you, as a marketer, ask your prospect to do something on your page. It often includes a button or link to an action like “Buy”, “Sign up”, or “Request a quote”.

The basics: The correct marketing hypothesis format

A well-structured hypothesis provides insights whether it is proved, disproved, or results are inconclusive.

You should never phrase a marketing hypothesis as a question. It should be written as a statement that can be rejected or confirmed.

Further, it should be a statement geared toward revealing insights – with this in mind, it helps to imagine each statement followed by a reason :

Changing _______ into ______ will increase [conversion goal], because:
Changing _______ into ______ will decrease [conversion goal], because:
Changing _______ into ______ will not affect [conversion goal], because:

Each of the above sentences ends with ‘because’ to set the expectation that there will be an explanation behind the results of whatever you’re testing.

It’s important to remember to plan ahead when you create a test, and think about explaining why the test turned out the way it did when the results come in.

Level up: Moving from a good to great hypothesis

Understanding what makes an idea worth testing is necessary for your optimization team.

If your tests are based on random ideas you googled or were suggested by a consultant, your testing process still has its training wheels on. Great hypotheses aren’t random. They’re based on rationale and aim for learning.

Hypotheses should be based on themes and analysis that show potential conversion barriers.

At Conversion, we call this investigation phase the “Explore Phase” where we use frameworks like the LIFT Model to understand the prospect’s unique perspective. (You can read more on the the full optimization process here).

A well-founded marketing hypothesis should also provide you with new, testable clues about your users regardless of whether or not the test wins, loses or yields inconclusive results.

These new insights should inform future testing: a solid hypothesis can help you quickly separate worthwhile ideas from the rest when planning follow-up tests.

“Ultimately, what matters most is that you have a hypothesis going into each experiment and you design each experiment to address that hypothesis.” – Nick So, VP of Delivery

Here’s a quick tip :

If you’re about to run a test that isn’t going to tell you anything new about your users and their motivations, it’s probably not worth investing your time in.

Let’s take this opportunity to refer back to your original idea:

Ok, but what now ? To get actionable insights from ‘a new CTA’, you need to know why it behaved the way it did. You need to ask the right question.

To test the waters, maybe you changed the copy of the CTA button on your lead generation form from “Submit” to “Send demo request”. If this change leads to an increase in conversions, it could mean that your users require more clarity about what their information is being used for.

That’s a potential insight.

Based on this insight, you could follow up with another test that adds copy around the CTA about next steps: what the user should anticipate after they have submitted their information.

For example, will they be speaking to a specialist via email? Will something be waiting for them the next time they visit your site? You can test providing more information, and see if your users are interested in knowing it!

That’s the cool thing about a good hypothesis: the results of the test, while important (of course) aren’t the only component driving your future test ideas. The insights gleaned lead to further hypotheses and insights in a virtuous cycle.

It’s based on a science

The term “hypothesis” probably isn’t foreign to you. In fact, it may bring up memories of grade-school science class; it’s a critical part of the scientific method .

The scientific method in testing follows a systematic routine that sets ideation up to predict the results of experiments via:

Collecting data and information through observation
Creating tentative descriptions of what is being observed
Forming hypotheses that predict different outcomes based on these observations
Testing your hypotheses
Analyzing the data, drawing conclusions and insights from the results

Don’t worry! Hypothesizing may seem ‘sciency’, but it doesn’t have to be complicated in practice.

Hypothesizing simply helps ensure the results from your tests are quantifiable, and is necessary if you want to understand how the results reflect the change made in your test.

A strong marketing hypothesis allows testers to use a structured approach in order to discover what works, why it works, how it works, where it works, and who it works on.

“My page needs a new CTA.” Is this idea in its current state clear enough to help you understand what works? Maybe. Why it works? No. Where it works? Maybe. Who it works on? No.

Your idea needs refining.

Let’s pull back and take a broader look at the lead generation landing page we want to test.

Imagine the situation: you’ve been diligent in your data collection and you notice several recurrences of Clarity pain points – meaning that there are many unclear instances throughout the page’s messaging.

Rather than focusing on the CTA right off the bat, it may be more beneficial to deal with the bigger clarity issue.

Now you’re starting to think about solving your prospects conversion barriers rather than just testing random ideas!

If you believe the overall page is unclear, your overarching theme of inquiry might be positioned as:

“Improving the clarity of the page will reduce confusion and improve [conversion goal].”

By testing a hypothesis that supports this clarity theme, you can gain confidence in the validity of it as an actionable marketing insight over time.

If the test results are negative : It may not be worth investigating this motivational barrier any further on this page. In this case, you could return to the data and look at the other motivational barriers that might be affecting user behavior.

If the test results are positive : You might want to continue to refine the clarity of the page’s message with further testing.

Typically, a test will start with a broad idea — you identify the changes to make, predict how those changes will impact your conversion goal, and write it out as a broad theme as shown above. Then, repeated tests aimed at that theme will confirm or undermine the strength of the underlying insight.

Building marketing hypotheses to create insights

You believe you’ve identified an overall problem on your landing page (there’s a problem with clarity). Now you want to understand how individual elements contribute to the problem, and the effect these individual elements have on your users.

It’s game time – now you can start designing a hypothesis that will generate insights.

You believe your users need more clarity. You’re ready to dig deeper to find out if that’s true!

If a specific question needs answering, you should structure your test to make a single change. This isolation might ask: “What element are users most sensitive to when it comes to the lack of clarity?” and “What changes do I believe will support increasing clarity?”

At this point, you’ll want to boil down your overarching theme…

Improving the clarity of the page will reduce confusion and improve [conversion goal].

…into a quantifiable hypothesis that isolates key sections:

Changing the wording of this CTA to set expectations for users (from “submit” to “send demo request”) will reduce confusion about the next steps in the funnel and improve order completions.

Does this answer what works? Yes: changing the wording on your CTA.

Does this answer why it works? Yes: reducing confusion about the next steps in the funnel.

Does this answer where it works? Yes: on this page, before the user enters this theoretical funnel.

Does this answer who it works on? No, this question demands another isolation. You might structure your hypothesis more like this:

Changing the wording of the CTA to set expectations for users (from “submit” to “send demo request”) will reduce confusion for visitors coming from my email campaign about the next steps in the funnel and improve order completions.

Now we’ve got a clear hypothesis. And one worth testing!

What makes a great hypothesis?

1. It’s testable.

2. It addresses conversion barriers.

3. It aims at gaining marketing insights.

Let’s compare:

The original idea : “My page needs a new CTA.”

Following the hypothesis structure : “A new CTA on my page will increase [conversion goal]”

The first test implied a problem with clarity, provides a potential theme : “Improving the clarity of the page will reduce confusion and improve [conversion goal].”

The potential clarity theme leads to a new hypothesis : “Changing the wording of the CTA to set expectations for users (from “submit” to “send demo request”) will reduce confusion about the next steps in the funnel and improve order completions.”

Final refined hypothesis : “Changing the wording of the CTA to set expectations for users (from “submit” to “send demo request”) will reduce confusion for visitors coming from my email campaign about the next steps in the funnel and improve order completions.”

Which test would you rather your team invest in?

Before you start your next test, take the time to do a proper analysis of the page you want to focus on. Do preliminary testing to define bigger issues, and use that information to refine and pinpoint your marketing hypothesis to give you forward-looking insights.

Doing this will help you avoid time-wasting tests, and enable you to start getting some insights for your team to keep testing!

Share this post

Join 5,000 other people who get our newsletter updates

Business Essentials
Leadership & Management
Credential of Leadership, Impact, and Management in Business (CLIMB)
Entrepreneurship & Innovation
Digital Transformation
Finance & Accounting
Business in Society
For Organizations
Support Portal
Media Coverage
Founding Donors
Leadership Team

Harvard Business School →
HBS Online →
Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

Career Development
Communication
Decision-Making
Earning Your MBA
Negotiation
News & Events
Productivity
Staff Spotlight
Student Profiles
Work-Life Balance
AI Essentials for Business
Alternative Investments
Business Analytics
Business Strategy
Business and Climate Change
Design Thinking and Innovation
Digital Marketing Strategy
Disruptive Strategy
Economics for Managers
Entrepreneurship Essentials
Financial Accounting
Global Business
Launching Tech Ventures
Leadership Principles
Leadership, Ethics, and Corporate Accountability
Leading Change and Organizational Renewal
Leading with Finance
Management Essentials
Negotiation Mastery
Organizational Leadership
Power and Influence for Positive Impact
Strategy Execution
Sustainable Business Strategy
Sustainable Investing
Winning with Digital Platforms

A Beginner’s Guide to Hypothesis Testing in Business

Business professionals performing hypothesis testing

30 Mar 2021

Becoming a more data-driven decision-maker can bring several benefits to your organization, enabling you to identify new opportunities to pursue and threats to abate. Rather than allowing subjective thinking to guide your business strategy, backing your decisions with data can empower your company to become more innovative and, ultimately, profitable.

If you’re new to data-driven decision-making, you might be wondering how data translates into business strategy. The answer lies in generating a hypothesis and verifying or rejecting it based on what various forms of data tell you.

Below is a look at hypothesis testing and the role it plays in helping businesses become more data-driven.

Access your free e-book today.

What Is Hypothesis Testing?

To understand what hypothesis testing is, it’s important first to understand what a hypothesis is.

A hypothesis or hypothesis statement seeks to explain why something has happened, or what might happen, under certain conditions. It can also be used to understand how different variables relate to each other. Hypotheses are often written as if-then statements; for example, “If this happens, then this will happen.”

Hypothesis testing , then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

Hypothesis Testing in Business

When it comes to data-driven decision-making, there’s a certain amount of risk that can mislead a professional. This could be due to flawed thinking or observations, incomplete or inaccurate data , or the presence of unknown variables. The danger in this is that, if major strategic decisions are made based on flawed insights, it can lead to wasted resources, missed opportunities, and catastrophic outcomes.

The real value of hypothesis testing in business is that it allows professionals to test their theories and assumptions before putting them into action. This essentially allows an organization to verify its analysis is correct before committing resources to implement a broader strategy.

As one example, consider a company that wishes to launch a new marketing campaign to revitalize sales during a slow period. Doing so could be an incredibly expensive endeavor, depending on the campaign’s size and complexity. The company, therefore, may wish to test the campaign on a smaller scale to understand how it will perform.

In this example, the hypothesis that’s being tested would fall along the lines of: “If the company launches a new marketing campaign, then it will translate into an increase in sales.” It may even be possible to quantify how much of a lift in sales the company expects to see from the effort. Pending the results of the pilot campaign, the business would then know whether it makes sense to roll it out more broadly.

Related: 9 Fundamental Data Science Skills for Business Professionals

Key Considerations for Hypothesis Testing

1. alternative hypothesis and null hypothesis.

In hypothesis testing, the hypothesis that’s being tested is known as the alternative hypothesis . Often, it’s expressed as a correlation or statistical relationship between variables. The null hypothesis , on the other hand, is a statement that’s meant to show there’s no statistical relationship between the variables being tested. It’s typically the exact opposite of whatever is stated in the alternative hypothesis.

For example, consider a company’s leadership team that historically and reliably sees $12 million in monthly revenue. They want to understand if reducing the price of their services will attract more customers and, in turn, increase revenue.

In this case, the alternative hypothesis may take the form of a statement such as: “If we reduce the price of our flagship service by five percent, then we’ll see an increase in sales and realize revenues greater than $12 million in the next month.”

The null hypothesis, on the other hand, would indicate that revenues wouldn’t increase from the base of $12 million, or might even decrease.

Check out the video below about the difference between an alternative and a null hypothesis, and subscribe to our YouTube channel for more explainer content.

2. Significance Level and P-Value

Statistically speaking, if you were to run the same scenario 100 times, you’d likely receive somewhat different results each time. If you were to plot these results in a distribution plot, you’d see the most likely outcome is at the tallest point in the graph, with less likely outcomes falling to the right and left of that point.

With this in mind, imagine you’ve completed your hypothesis test and have your results, which indicate there may be a correlation between the variables you were testing. To understand your results' significance, you’ll need to identify a p-value for the test, which helps note how confident you are in the test results.

In statistics, the p-value depicts the probability that, assuming the null hypothesis is correct, you might still observe results that are at least as extreme as the results of your hypothesis test. The smaller the p-value, the more likely the alternative hypothesis is correct, and the greater the significance of your results.

3. One-Sided vs. Two-Sided Testing

When it’s time to test your hypothesis, it’s important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests , or one-tailed and two-tailed tests, respectively.

Typically, you’d leverage a one-sided test when you have a strong conviction about the direction of change you expect to see due to your hypothesis test. You’d leverage a two-sided test when you’re less confident in the direction of change.

Business Analytics | Become a data-driven leader | Learn More

4. Sampling

To perform hypothesis testing in the first place, you need to collect a sample of data to be analyzed. Depending on the question you’re seeking to answer or investigate, you might collect samples through surveys, observational studies, or experiments.

A survey involves asking a series of questions to a random population sample and recording self-reported responses.

Observational studies involve a researcher observing a sample population and collecting data as it occurs naturally, without intervention.

Finally, an experiment involves dividing a sample into multiple groups, one of which acts as the control group. For each non-control group, the variable being studied is manipulated to determine how the data collected differs from that of the control group.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

Learn How to Perform Hypothesis Testing

Hypothesis testing is a complex process involving different moving pieces that can allow an organization to effectively leverage its data and inform strategic decisions.

If you’re interested in better understanding hypothesis testing and the role it can play within your organization, one option is to complete a course that focuses on the process. Doing so can lay the statistical and analytical foundation you need to succeed.

Do you want to learn more about hypothesis testing? Explore Business Analytics —one of our online business essentials courses —and download our Beginner’s Guide to Data & Analytics .

About the Author

Hypothesis Testing: Definition, Uses, Limitations + Examples

Hypothesis testing is as old as the scientific method and is at the heart of the research process.

Research exists to validate or disprove assumptions about various phenomena. The process of validation involves testing and it is in this context that we will explore hypothesis testing.

What is a Hypothesis?

A hypothesis is a calculated prediction or assumption about a population parameter based on limited evidence. The whole idea behind hypothesis formulation is testing—this means the researcher subjects his or her calculated assumption to a series of evaluations to know whether they are true or false.

Typically, every research starts with a hypothesis—the investigator makes a claim and experiments to prove that this claim is true or false . For instance, if you predict that students who drink milk before class perform better than those who don’t, then this becomes a hypothesis that can be confirmed or refuted using an experiment.

Read: What is Empirical Research Study? [Examples & Method]

What are the Types of Hypotheses?

1. simple hypothesis.

Also known as a basic hypothesis, a simple hypothesis suggests that an independent variable is responsible for a corresponding dependent variable. In other words, an occurrence of the independent variable inevitably leads to an occurrence of the dependent variable.

Typically, simple hypotheses are considered as generally true, and they establish a causal relationship between two variables.

Examples of Simple Hypothesis

Drinking soda and other sugary drinks can cause obesity.
Smoking cigarettes daily leads to lung cancer.

2. Complex Hypothesis

A complex hypothesis is also known as a modal. It accounts for the causal relationship between two independent variables and the resulting dependent variables. This means that the combination of the independent variables leads to the occurrence of the dependent variables .

Examples of Complex Hypotheses

Adults who do not smoke and drink are less likely to develop liver-related conditions.
Global warming causes icebergs to melt which in turn causes major changes in weather patterns.

3. Null Hypothesis

As the name suggests, a null hypothesis is formed when a researcher suspects that there’s no relationship between the variables in an observation. In this case, the purpose of the research is to approve or disapprove this assumption.

Examples of Null Hypothesis

This is no significant change in a student’s performance if they drink coffee or tea before classes.
There’s no significant change in the growth of a plant if one uses distilled water only or vitamin-rich water.

Read: Research Report: Definition, Types + [Writing Guide]

4. Alternative Hypothesis

To disapprove a null hypothesis, the researcher has to come up with an opposite assumption—this assumption is known as the alternative hypothesis. This means if the null hypothesis says that A is false, the alternative hypothesis assumes that A is true.

An alternative hypothesis can be directional or non-directional depending on the direction of the difference. A directional alternative hypothesis specifies the direction of the tested relationship, stating that one variable is predicted to be larger or smaller than the null value while a non-directional hypothesis only validates the existence of a difference without stating its direction.

Examples of Alternative Hypotheses

Starting your day with a cup of tea instead of a cup of coffee can make you more alert in the morning.
The growth of a plant improves significantly when it receives distilled water instead of vitamin-rich water.

5. Logical Hypothesis

Logical hypotheses are some of the most common types of calculated assumptions in systematic investigations. It is an attempt to use your reasoning to connect different pieces in research and build a theory using little evidence. In this case, the researcher uses any data available to him, to form a plausible assumption that can be tested.

Examples of Logical Hypothesis

Waking up early helps you to have a more productive day.
Beings from Mars would not be able to breathe the air in the atmosphere of the Earth.

6. Empirical Hypothesis

After forming a logical hypothesis, the next step is to create an empirical or working hypothesis. At this stage, your logical hypothesis undergoes systematic testing to prove or disprove the assumption. An empirical hypothesis is subject to several variables that can trigger changes and lead to specific outcomes.

Examples of Empirical Testing

People who eat more fish run faster than people who eat meat.
Women taking vitamin E grow hair faster than those taking vitamin K.

7. Statistical Hypothesis

When forming a statistical hypothesis, the researcher examines the portion of a population of interest and makes a calculated assumption based on the data from this sample. A statistical hypothesis is most common with systematic investigations involving a large target audience. Here, it’s impossible to collect responses from every member of the population so you have to depend on data from your sample and extrapolate the results to the wider population.

Examples of Statistical Hypothesis

45% of students in Louisiana have middle-income parents.
80% of the UK’s population gets a divorce because of irreconcilable differences.

What is Hypothesis Testing?

Hypothesis testing is an assessment method that allows researchers to determine the plausibility of a hypothesis. It involves testing an assumption about a specific population parameter to know whether it’s true or false. These population parameters include variance, standard deviation, and median.

Typically, hypothesis testing starts with developing a null hypothesis and then performing several tests that support or reject the null hypothesis. The researcher uses test statistics to compare the association or relationship between two or more variables.

Explore: Research Bias: Definition, Types + Examples

Researchers also use hypothesis testing to calculate the coefficient of variation and determine if the regression relationship and the correlation coefficient are statistically significant.

How Hypothesis Testing Works

The basis of hypothesis testing is to examine and analyze the null hypothesis and alternative hypothesis to know which one is the most plausible assumption. Since both assumptions are mutually exclusive, only one can be true. In other words, the occurrence of a null hypothesis destroys the chances of the alternative coming to life, and vice-versa.

Interesting: 21 Chrome Extensions for Academic Researchers in 2021

What Are The Stages of Hypothesis Testing?

To successfully confirm or refute an assumption, the researcher goes through five (5) stages of hypothesis testing;

Determine the null hypothesis
Specify the alternative hypothesis
Set the significance level
Calculate the test statistics and corresponding P-value
Draw your conclusion
Determine the Null Hypothesis

Like we mentioned earlier, hypothesis testing starts with creating a null hypothesis which stands as an assumption that a certain statement is false or implausible. For example, the null hypothesis (H0) could suggest that different subgroups in the research population react to a variable in the same way.

Specify the Alternative Hypothesis

Once you know the variables for the null hypothesis, the next step is to determine the alternative hypothesis. The alternative hypothesis counters the null assumption by suggesting the statement or assertion is true. Depending on the purpose of your research, the alternative hypothesis can be one-sided or two-sided.

Using the example we established earlier, the alternative hypothesis may argue that the different sub-groups react differently to the same variable based on several internal and external factors.

Set the Significance Level

Many researchers create a 5% allowance for accepting the value of an alternative hypothesis, even if the value is untrue. This means that there is a 0.05 chance that one would go with the value of the alternative hypothesis, despite the truth of the null hypothesis.

Something to note here is that the smaller the significance level, the greater the burden of proof needed to reject the null hypothesis and support the alternative hypothesis.

Explore: What is Data Interpretation? + [Types, Method & Tools]

Calculate the Test Statistics and Corresponding P-Value

Test statistics in hypothesis testing allow you to compare different groups between variables while the p-value accounts for the probability of obtaining sample statistics if your null hypothesis is true. In this case, your test statistics can be the mean, median and similar parameters.

If your p-value is 0.65, for example, then it means that the variable in your hypothesis will happen 65 in100 times by pure chance. Use this formula to determine the p-value for your data:

Draw Your Conclusions

After conducting a series of tests, you should be able to agree or refute the hypothesis based on feedback and insights from your sample data.

Applications of Hypothesis Testing in Research

Hypothesis testing isn’t only confined to numbers and calculations; it also has several real-life applications in business, manufacturing, advertising, and medicine.

In a factory or other manufacturing plants, hypothesis testing is an important part of quality and production control before the final products are approved and sent out to the consumer.

During ideation and strategy development, C-level executives use hypothesis testing to evaluate their theories and assumptions before any form of implementation. For example, they could leverage hypothesis testing to determine whether or not some new advertising campaign, marketing technique, etc. causes increased sales.

In addition, hypothesis testing is used during clinical trials to prove the efficacy of a drug or new medical method before its approval for widespread human usage.

What is an Example of Hypothesis Testing?

An employer claims that her workers are of above-average intelligence. She takes a random sample of 20 of them and gets the following results:

Mean IQ Scores: 110

Standard Deviation: 15

Mean Population IQ: 100

Step 1: Using the value of the mean population IQ, we establish the null hypothesis as 100.

Step 2: State that the alternative hypothesis is greater than 100.

Step 3: State the alpha level as 0.05 or 5%

Step 4: Find the rejection region area (given by your alpha level above) from the z-table. An area of .05 is equal to a z-score of 1.645.

Step 5: Calculate the test statistics using this formula

Z = (110–100) ÷ (15÷√20)

10 ÷ 3.35 = 2.99

If the value of the test statistics is higher than the value of the rejection region, then you should reject the null hypothesis. If it is less, then you cannot reject the null.

In this case, 2.99 > 1.645 so we reject the null.

Importance/Benefits of Hypothesis Testing

The most significant benefit of hypothesis testing is it allows you to evaluate the strength of your claim or assumption before implementing it in your data set. Also, hypothesis testing is the only valid method to prove that something “is or is not”. Other benefits include:

Hypothesis testing provides a reliable framework for making any data decisions for your population of interest.
It helps the researcher to successfully extrapolate data from the sample to the larger population.
Hypothesis testing allows the researcher to determine whether the data from the sample is statistically significant.
Hypothesis testing is one of the most important processes for measuring the validity and reliability of outcomes in any systematic investigation.
It helps to provide links to the underlying theory and specific research questions.

Criticism and Limitations of Hypothesis Testing

Several limitations of hypothesis testing can affect the quality of data you get from this process. Some of these limitations include:

The interpretation of a p-value for observation depends on the stopping rule and definition of multiple comparisons. This makes it difficult to calculate since the stopping rule is subject to numerous interpretations, plus “multiple comparisons” are unavoidably ambiguous.
Conceptual issues often arise in hypothesis testing, especially if the researcher merges Fisher and Neyman-Pearson’s methods which are conceptually distinct.
In an attempt to focus on the statistical significance of the data, the researcher might ignore the estimation and confirmation by repeated experiments.
Hypothesis testing can trigger publication bias, especially when it requires statistical significance as a criterion for publication.
When used to detect whether a difference exists between groups, hypothesis testing can trigger absurd assumptions that affect the reliability of your observation.

Connect to Formplus, Get Started Now - It's Free!

alternative hypothesis
alternative vs null hypothesis
complex hypothesis
empirical hypothesis
hypothesis testing
logical hypothesis
simple hypothesis
statistical hypothesis
busayo.longe

Internal Validity in Research: Definition, Threats, Examples

In this article, we will discuss the concept of internal validity, some clear examples, its importance, and how to test it.

Alternative vs Null Hypothesis: Pros, Cons, Uses & Examples

We are going to discuss alternative hypotheses and null hypotheses in this post and how they work in research.

What is Pure or Basic Research? + [Examples & Method]

Simple guide on pure or basic research, its methods, characteristics, advantages, and examples in science, medicine, education and psychology

Type I vs Type II Errors: Causes, Examples & Prevention

This article will discuss the two different types of errors in hypothesis testing and how you can prevent them from occurring in your research

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Marketing Analytics Lab

Hypothesis Testing in Marketing Research

Introduction to Hypothesis Testing in Marketing Research

Hypothesis testing is a critical component of marketing research that allows marketers to draw conclusions about the effectiveness of their strategies. In essence, hypothesis testing involves making an educated guess about a population parameter and then using data to determine if the hypothesis is supported or rejected. In the context of marketing, hypotheses can be formulated about consumer behavior, product preferences, advertising effectiveness, and many other aspects of the marketing mix. By conducting hypothesis tests, marketers can make informed decisions based on empirical evidence rather than intuition or guesswork.

A hypothesis test in marketing research typically follows a structured process that involves defining a null hypothesis (H0) and an alternative hypothesis (HA), collecting and analyzing data, determining the appropriate statistical test to use, setting a significance level, and interpreting the results to either accept or reject the null hypothesis. The null hypothesis represents the status quo or the assumption that there is no significant difference or relationship between variables, while the alternative hypothesis suggests that there is a significant effect or relationship. By rigorously testing hypotheses, marketers can evaluate the impact of their marketing strategies and make data-driven decisions to optimize their campaigns and initiatives.

The results of hypothesis testing in marketing research provide valuable insights that can inform strategic decision-making and help marketers achieve their business objectives. Whether testing the effectiveness of a new product launch, evaluating the impact of a promotional campaign, or analyzing consumer preferences, hypothesis testing enables marketers to quantify the impact of their actions and make evidence-based recommendations. By employing statistical techniques and hypothesis testing in marketing research, organizations can gain a deeper understanding of consumer behavior, identify market trends, and refine their marketing strategies to drive business growth and success.

Key Steps and Considerations for Hypothesis Testing in Marketing Analysis

When conducting hypothesis testing in marketing research, there are several key steps and considerations that marketers should keep in mind to ensure the validity and reliability of their findings. Firstly, it is essential to clearly define the research question and formulate testable hypotheses that are specific, measurable, and relevant to the marketing objectives. By articulating clear hypotheses, marketers can establish a framework for data collection and analysis that aligns with the research objectives.

Once the hypotheses have been formulated, the next step is to determine the appropriate research design and methodology for data collection. Depending on the nature of the research question and the variables involved, marketers may choose to conduct experiments, surveys, observational studies, or other research methods to gather data. It is crucial to ensure that the data collected is representative of the target population and is collected in a systematic and unbiased manner to generate reliable results.

After collecting the data, marketers can perform statistical analysis to test the hypotheses using techniques such as t-tests, ANOVA, regression analysis, or chi-square tests, among others. It is important to select the appropriate statistical test based on the type of data and the research question being investigated. Additionally, setting a significance level (alpha) is crucial for determining the threshold for accepting or rejecting the null hypothesis. By interpreting the results in the context of the significance level, marketers can make informed decisions about the implications of the findings and their impact on marketing strategies.

A/B Testing in Digital Marketing: Example of four-step hypothesis framework

by Daniel Burstein , Senior Director, Content & Marketing, MarketingSherpa and MECLABS Institute

This article was originally published in the MarketingSherpa email newsletter .

If you are a marketing expert — whether in a brand’s marketing department or at an advertising agency — you may feel the need to be absolutely sure in an unsure world.

What should the headline be? What images should we use? Is this strategy correct? Will customers value this promo?

This is the stuff you’re paid to know. So you may feel like you must boldly proclaim your confident opinion.

But you can’t predict the future with 100% accuracy. You can’t know with absolute certainty how humans will behave. And let’s face it, even as marketing experts we’re occasionally wrong.

It’s not bad, it’s healthy. And the most effective way to overcome that doubt is by testing our marketing creative to see what really works.

Developing a hypothesis

After we published Value Sequencing: A step-by-step examination of a landing page that generated 638% more conversions , a MarketingSherpa reader emailed us and asked …

Great stuff Daniel. Much appreciated. I can see you addressing all the issues there.

I thought I saw one more opportunity to expand on what you made. Would you consider adding the IF, BY, WILL, BECAUSE to the control/treatment sections so we can see what psychology you were addressing so we know how to create the hypothesis to learn from what the customer is currently doing and why and then form a test to address that? The video today on customer theory was great (Editor’s Note: Part of the MarketingExperiments YouTube Live series ) . I think there is a way to incorporate that customer theory thinking into this article to take it even further.

Developing a hypothesis is an essential part of marketing experimentation. Qualitative-based research should inform hypotheses that you test with real-world behavior.

The hypotheses help you discover how accurate those insights from qualitative research are. If you engage in hypothesis-driven testing, then you ensure your tests are strategic (not just based on a random idea) and built in a way that enables you to learn more and more about the customer with each test.

And that methodology will ultimately lead to greater and greater lifts over time, instead of a scattershot approach where sometimes you get a lift and sometimes you don’t, but you never really know why.

Here is a handy tool to help you in developing hypotheses — the MECLABS Four-Step Hypothesis Framework.

As the reader suggests, I will use the landing page test referenced in the previous article as an example. ( Please note: While the experiment in that article was created with a hypothesis-driven approach, this specific four-step framework is fairly new and was not in common use by the MECLABS team at that time, so I have created this specific example after the test was developed based on what I see in the test).

Here is what the hypothesis would look like for that test, and then we’ll break down each part individually:

If we emphasize the process-level value by adding headlines, images and body copy, we will generate more leads because the value of a longer landing page in reducing the anxiety of calling a TeleAgent outweighs the additional friction of a longer page.

IF: Summary description

The hypothesis begins with an overall statement about what you are trying to do in the experiment. In this case, the experiment is trying to emphasize the process-level value proposition (one of the four essential levels of value proposition ) of having a phone call with a TeleAgent.

The control landing page was emphasizing the primary value proposition of the brand itself.

The treatment landing page is essentially trying to answer this value proposition question: If I am your ideal customer, why should I call a TeleAgent rather than take any other action to learn more about my Medicare options?

The control landing page was asking a much bigger question that customers weren’t ready to say “yes” to yet, and it was overlooking the anxiety inherent in getting on a phone call with someone who might try to sell you something: If I am your ideal customer, why should I buy from your company instead of any other company.

This step answers WHAT you are trying to do.

BY: Remove, add, change

The next step answers HOW you are going to do it.

As Flint McGlaughlin, CEO and Managing Director of MECLABS Institute teaches, there are only three ways to improve performance: removing, adding or changing .

In this case, the team focused mostly on adding — adding headlines, images and body copy that highlighted the TeleAgents as trusted advisors.

“Adding” can be counterintuitive for many marketers. The team’s original landing page was short. Conventional wisdom says customers won’t read long landing pages. When I’m presenting to a group of marketers, I’ll put a short and long landing page on a slide and ask which page they think achieved better results.

Invariably I will hear, “Oh, the shorter page. I would never read something that long.”

That first-person statement is a mistake. Your marketing creative should not be based on “I” — the marketer. It should be based on “they” — the customer.

Most importantly, you need to focus on the customer at a specific point in time — when he or she is in the mindspace of considering to take an action like purchase a product or in need of more information before they decide to download a whitepaper. And sometimes in these situations, longer landing pages perform better.

In the case of this landing page, even the customer may not necessarily favor a long landing page all the time. But in the real-world situation when they are considering whether to call a TeleAgent or not, the added value helps more customers decide to take the action.

WILL: Improve performance

This is your KPI (key performance indicator). This step answers another HOW question: How do you know your hypothesis has been supported or refuted?

You can choose secondary metrics to monitor during your test as well. This might help you interpret the customer behavior observed in the test.

But ultimately, the hypothesis should rest on a single metric.

For this test, the goal was to generate more leads. And the treatment did — 638% more leads.

BECAUSE: Customer insight

This last step answers a WHY question — why did the customers act this way?

This helps you determine what you can learn about customers based on the actions observed in the experiment.

This is ultimately why you test. To learn about the customer and continually refine your company’s customer theory .

In this case, the team theorized that the value of a longer landing page in reducing the anxiety of calling a TeleAgent outweighs the additional friction of a longer landing page.

And the test results support that hypothesis.

Related Resources

The Hypothesis and the Modern-Day Marketer

Boost your Conversion Rate with a MECLABS Quick Win Intensive

Designing Hypotheses that Win: A four-step framework for gaining customer wisdom and generating marketing results

Improve Your Marketing

Join our thousands of weekly case study readers.

Enter your email below to receive MarketingSherpa news, updates, and promotions:

Note: Already a subscriber? Want to add a subscription? Click Here to Manage Subscriptions

Get Better Business Results With a Skillfully Applied Customer-first Marketing Strategy

The customer-first approach of MarketingSherpa’s agency services can help you build the most effective strategy to serve customers and improve results, and then implement it across every customer touchpoint.

Get headlines, value prop, competitive analysis, and more.

Marketer Vs Machine

Marketer Vs Machine: We need to train the marketer to train the machine.

Free Marketing Course

Become a Marketer-Philosopher: Create and optimize high-converting webpages (with this free online marketing course)

Project and Ideas Pitch Template

A free template to help you win approval for your proposed projects and campaigns

Six Quick CTA checklists

These CTA checklists are specifically designed for your team — something practical to hold up against your CTAs to help the time-pressed marketer quickly consider the customer psychology of your “asks” and how you can improve them.

Infographic: How to Create a Model of Your Customer’s Mind

You need a repeatable methodology focused on building your organization’s customer wisdom throughout your campaigns and websites. This infographic can get you started.

Infographic: 21 Psychological Elements that Power Effective Web Design

To build an effective page from scratch, you need to begin with the psychology of your customer. This infographic can get you started.

Receive the latest case studies and data on email, lead gen, and social media along with MarketingSherpa updates and promotions.

Your Email Account
Customer Service Q&A
Search Library
Content Directory:

Questions? Contact Customer Service at [email protected]

The views and opinions expressed in the articles of this website are strictly those of the author and do not necessarily reflect in any way the views of MarketingSherpa, its affiliates, or its employees.

Join the AMA
Find learning by topic
Free learning resources for members
Certification
Training for teams
Why learn with the AMA?
Marketing News
Academic Journals
Guides & eBooks
Marketing Job Board
Academic Job Board
AMA Foundation
Diversity, Equity and Inclusion
Collegiate Resources
Awards and Scholarships
Sponsorship Opportunities
Strategic Partnerships

We noticed that you are using Internet Explorer 11 or older that is not support any longer. Please consider using an alternative such as Microsoft Edge, Chrome, or Firefox.

Hypothesis Testing Tool

This template will help you test a hypothesis as part of a market research effort.

Estimated time required: 2 hours
Skills required: Market research

Get Full Access to This Resource With AMA Membership

Market Research Playbook

This tool can be used alone, but it’s also part of the comprehensive Market Research Playbook. It provides step-by-step planning guidance while also helping you utilize more than 25 downloadable tools from the popular AMA Marketer’s Toolkit library.

This tool is powered by Demand Metric .

By continuing to use this site, you accept the use of cookies, pixels and other technology that allows us to understand our users better and offer you tailored content. You can learn more about our privacy policy here

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

Free Account
For Digital
For Customer Care
For Human Resources
For Researchers
Financial Services
All Industries

Popular Use Cases

Customer Experience
Employee Experience
Net Promoter Score
Voice of Customer
Customer Success Hub
Product Documentation
Training & Certification
XM Institute
Popular Resources
Customer Stories
Artificial Intelligence

Market Research

Partnerships
Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

English/AU & NZ
Español/Europa
Español/América Latina
Português Brasileiro
REQUEST DEMO

9 Key stages in your marketing research process

You can conduct your own marketing research. Follow these steps, add your own flair, knowledge and creativity, and you’ll have bespoke research to be proud of.

Marketing research is the term used to cover the concept, development, placement and evolution of your product or service, its growing customer base and its branding – starting with brand awareness , and progressing to (everyone hopes) brand equity . Like any research, it needs a robust process to be credible and useful.

Marketing research uses four essential key factors known as the ‘marketing mix’ , or the Four Ps of Marketing :

Product (goods or service)
Price ( how much the customer pays )
Place (where the product is marketed)
Promotion (such as advertising and PR)

These four factors need to work in harmony for a product or service to be successful in its marketplace.

The marketing research process – an overview

A typical marketing research process is as follows:

Identify an issue, discuss alternatives and set out research objectives
Develop a research program
Choose a sample
Gather information
Gather data
Organize and analyze information and data
Present findings
Make research-based decisions
Take action based on insights

Step 1: Defining the marketing research problem

Defining a problem is the first step in the research process. In many ways, research starts with a problem facing management. This problem needs to be understood, the cause diagnosed, and solutions developed.

However, most management problems are not always easy to research, so they must first be translated into research problems. Once you approach the problem from a research angle, you can find a solution. For example, “sales are not growing” is a management problem, but translated into a research problem, it becomes “ why are sales not growing?” We can look at the expectations and experiences of several groups : potential customers, first-time buyers, and repeat purchasers. We can question whether the lack of sales is due to:

Poor expectations that lead to a general lack of desire to buy, or
Poor performance experience and a lack of desire to repurchase.

This, then, is the difference between a management problem and a research problem. Solving management problems focuses on actions: Do we advertise more? Do we change our advertising message? Do we change an under-performing product configuration? And if so, how?

Defining research problems, on the other hand, focus on the whys and hows, providing the insights you need to solve your management problem.

Step 2: Developing a research program: method of inquiry

The scientific method is the standard for investigation. It provides an opportunity for you to use existing knowledge as a starting point, and proceed impartially.

The scientific method includes the following steps:

Define a problem
Develop a hypothesis
Make predictions based on the hypothesis
Devise a test of the hypothesis
Conduct the test
Analyze the results

This terminology is similar to the stages in the research process. However, there are subtle differences in the way the steps are performed:

the scientific research method is objective and fact-based, using quantitative research and impartial analysis
the marketing research process can be subjective, using opinion and qualitative research, as well as personal judgment as you collect and analyze data

Step 3: Developing a research program: research method

As well as selecting a method of inquiry (objective or subjective), you must select a research method . There are two primary methodologies that can be used to answer any research question:

Experimental research : gives you the advantage of controlling extraneous variables and manipulating one or more variables that influence the process being implemented.
Non-experimental research : allows observation but not intervention – all you do is observe and report on your findings.

Step 4: Developing a research program: research design

Research design is a plan or framework for conducting marketing research and collecting data. It is defined as the specific methods and procedures you use to get the information you need.

There are three core types of marketing research designs: exploratory, descriptive, and causal . A thorough marketing research process incorporates elements of all of them.

Exploratory marketing research

This is a starting point for research. It’s used to reveal facts and opinions about a particular topic, and gain insight into the main points of an issue. Exploratory research is too much of a blunt instrument to base conclusive business decisions on, but it gives the foundation for more targeted study. You can use secondary research materials such as trade publications, books, journals and magazines and primary research using qualitative metrics, that can include open text surveys, interviews and focus groups.

Descriptive marketing research

This helps define the business problem or issue so that companies can make decisions, take action and monitor progress. Descriptive research is naturally quantitative – it needs to be measured and analyzed statistically , using more targeted surveys and questionnaires. You can use it to capture demographic information , evaluate a product or service for market, and monitor a target audience’s opinion and behaviors. Insights from descriptive research can inform conclusions about the market landscape and the product’s place in it.

Causal marketing research

This is useful to explore the cause and effect relationship between two or more variables. Like descriptive research , it uses quantitative methods, but it doesn’t merely report findings; it uses experiments to predict and test theories about a product or market. For example, researchers may change product packaging design or material, and measure what happens to sales as a result.

Step 5: Choose your sample

Your marketing research project will rarely examine an entire population. It’s more practical to use a sample - a smaller but accurate representation of the greater population. To design your sample, you’ll need to answer these questions:

Which base population is the sample to be selected from? Once you’ve established who your relevant population is (your research design process will have revealed this), you have a base for your sample. This will allow you to make inferences about a larger population.
What is the method (process) for sample selection? There are two methods of selecting a sample from a population:

1. Probability sampling : This relies on a random sampling of everyone within the larger population.

2. Non-probability sampling : This is based in part on the investigator’s judgment, and often uses convenience samples, or by other sampling methods that do not rely on probability.

What is your sample size? This important step involves cost and accuracy decisions. Larger samples generally reduce sampling error and increase accuracy, but also increase costs. Find out your perfect sample size with our calculator .

Step 6: Gather data

Your research design will develop as you select techniques to use. There are many channels for collecting data, and it’s helpful to differentiate it into O-data (Operational) and X-data (Experience):

O-data is your business’s hard numbers like costs, accounting, and sales. It tells you what has happened, but not why.
X-data gives you insights into the thoughts and emotions of the people involved: employees, customers, brand advocates.

When you combine O-data with X-data, you’ll be able to build a more complete picture about success and failure - you’ll know why. Maybe you’ve seen a drop in sales (O-data) for a particular product. Maybe customer service was lacking, the product was out of stock, or advertisements weren’t impactful or different enough: X-data will reveal the reason why those sales dropped. So, while differentiating these two data sets is important, when they are combined, and work with each other, the insights become powerful.

With mobile technology, it has become easier than ever to collect data. Survey research has come a long way since market researchers conducted face-to-face, postal, or telephone surveys. You can run research through:

Social media ( polls and listening )

Another way to collect data is by observation. Observing a customer’s or company’s past or present behavior can predict future purchasing decisions. Data collection techniques for predicting past behavior can include market segmentation , customer journey mapping and brand tracking .

Regardless of how you collect data, the process introduces another essential element to your research project: the importance of clear and constant communication .

And of course, to analyze information from survey or observation techniques, you must record your results . Gone are the days of spreadsheets. Feedback from surveys and listening channels can automatically feed into AI-powered analytics engines and produce results, in real-time, on dashboards.

Step 7: Analysis and interpretation

The words ‘ statistical analysis methods ’ aren’t usually guaranteed to set a room alight with excitement, but when you understand what they can do, the problems they can solve and the insights they can uncover, they seem a whole lot more compelling.

Statistical tests and data processing tools can reveal:

Whether data trends you see are meaningful or are just chance results
Your results in the context of other information you have
Whether one thing affecting your business is more significant than others
What your next research area should be
Insights that lead to meaningful changes

There are several types of statistical analysis tools used for surveys. You should make sure that the ones you choose:

Work on any platform - mobile, desktop, tablet etc.
Integrate with your existing systems
Are easy to use with user-friendly interfaces, straightforward menus, and automated data analysis
Incorporate statistical analysis so you don’t just process and present your data, but refine it, and generate insights and predictions.

Here are some of the most common tools:

Benchmarking : a way of taking outside factors into account so that you can adjust the parameters of your research. It ‘levels the playing field’ – so that your data and results are more meaningful in context. And gives you a more precise understanding of what’s happening.
Regression analysis : this is used for working out the relationship between two (or more) variables. It is useful for identifying the precise impact of a change in an independent variable.
T-test is used for comparing two data groups which have different mean values. For example, do women and men have different mean heights?
Analysis of variance (ANOVA) Similar to the T-test, ANOVA is a way of testing the differences between three or more independent groups to see if they’re statistically significant.
Cluster analysis : This organizes items into groups, or clusters, based on how closely associated they are.
Factor analysis: This is a way of condensing many variables into just a few, so that your research data is less unwieldy to work with.
Conjoint analysis : this will help you understand and predict why people make the choices they do. It asks people to make trade-offs when making decisions, just as they do in the real world, then analyzes the results to give the most popular outcome.
Crosstab analysis : this is a quantitative market research tool used to analyze ‘categorical data’ - variables that are different and mutually exclusive, such as: ‘men’ and ‘women’, or ‘under 30’ and ‘over 30’.
Text analysis and sentiment analysis : Analyzing human language and emotions is a rapidly-developing form of data processing, assigning positive, negative or neutral sentiment to customer messages and feedback.

Stats IQ can perform the most complicated statistical tests at the touch of a button using our online survey software , or data from other sources. Learn more about Stats iQ now .

Step 8: The marketing research results

Your marketing research process culminates in the research results. These should provide all the information the stakeholders and decision-makers need to understand the project.

The results will include:

all your information
a description of your research process
the results
conclusions
recommended courses of action

They should also be presented in a form, language and graphics that are easy to understand, with a balance between completeness and conciseness, neither leaving important information out or allowing it to get so technical that it overwhelms the readers.

Traditionally, you would prepare two written reports:

a technical report , discussing the methods, underlying assumptions and the detailed findings of the research project
a summary report , that summarizes the research process and presents the findings and conclusions simply.

There are now more engaging ways to present your findings than the traditional PowerPoint presentations, graphs, and face-to-face reports:

Live, interactive dashboards for sharing the most important information, as well as tracking a project in real time.
Results-reports visualizations – tables or graphs with data visuals on a shareable slide deck
Online presentation technology, such as Prezi
Visual storytelling with infographics
A single-page executive summary with key insights
A single-page stat sheet with the top-line stats

You can also make these results shareable so that decision-makers have all the information at their fingertips.

Step 9 Turn your insights into action

Insights are one thing, but they’re worth very little unless they inform immediate, positive action. Here are a few examples of how you can do this:

Stop customers leaving – negative sentiment among VIP customers gets picked up; the customer service team contacts the customers, resolves their issues, and avoids churn .
Act on important employee concerns – you can set certain topics, such as safety, or diversity and inclusion to trigger an automated notification or Slack message to HR. They can rapidly act to rectify the issue.
Address product issues – maybe deliveries are late, maybe too many products are faulty. When product feedback gets picked up through Smart Conversations, messages can be triggered to the delivery or product teams to jump on the problems immediately.
Improve your marketing effectiveness - Understand how your marketing is being received by potential customers, so you can find ways to better meet their needs
Grow your brand - Understand exactly what consumers are looking for, so you can make sure that you’re meeting their expectations

Download now: 8 Innovations to Modernize Market Research

Scott Smith

Scott Smith, Ph.D. is a contributor to the Qualtrics blog.

December 20, 2023

Top market research analyst skills for 2024

November 7, 2023

Brand Experience

The 4 market research trends redefining insights in 2024

September 14, 2023

How BMG and Loop use data to make critical decisions

August 21, 2023

Designing for safety: Making user consent and trust an organizational asset

June 27, 2023

The fresh insights people: Scaling research at Woolworths Group

June 20, 2023

Bank less, delight more: How Bankwest built an engine room for customer obsession

June 16, 2023

How Qualtrics Helps Three Local Governments Drive Better Outcomes Through Data Insights

April 1, 2023

Academic Experience

How to write great survey questions (with examples)

Stay up to date with the latest xm thought leadership, tips and news., request demo.

Ready to learn more about Qualtrics?

Expert Advice on Developing a Hypothesis for Marketing Experimentation

Conversion Rate Optimization

Simbar Dube

Every marketing experimentation process has to have a solid hypothesis.

That’s a must – unless you want to be roaming in the dark and heading towards a dead-end in your experimentation program.

Hypothesizing is the second phase of our SHIP optimization process here at Invesp.

It comes after we have completed the research phase.

This is an indication that we don’t just pull a hypothesis out of thin air. We always make sure that it is based on research data.

But having a research-backed hypothesis doesn’t mean that the hypothesis will always be correct. In fact, tons of hypotheses bear inconclusive results or get disproved.

The main idea of having a hypothesis in marketing experimentation is to help you gain insights – regardless of the testing outcome.

By the time you finish reading this article, you’ll know:

The essential tips on what to do when crafting a hypothesis for marketing experiments
How a marketing experiment hypothesis works

How experts develop a solid hypothesis

The basics: marketing experimentation hypothesis.

A hypothesis is a research-based statement that aims to explain an observed trend and create a solution that will improve the result. This statement is an educated, testable prediction about what will happen.

It has to be stated in declarative form and not as a question.

“ If we add magnification info, product video and making virtual mirror buttons, will that improve engagement? ” is not declarative, but “ Improving the experience of product pages by adding magnification info, product video and making virtual mirror buttons will increase engagement ” is.

Here’s a quick example of how a hypothesis should be phrased:

Replacing ___ with __ will increase [conversion goal] by [%], because:
Removing ___ and __ will decrease [conversion goal] by [%], because:
Changing ___ into __ will not affect [conversion goal], because:
Improving ___ by ___will increase [conversion goal], because:

As you can see from the above sentences, a good hypothesis is written in clear and simple language. Reading your hypothesis should tell your team members exactly what you thought was going to happen in an experiment.

Another important element of a good hypothesis is that it defines the variables in easy-to-measure terms, like who the participants are, what changes during the testing, and what the effect of the changes will be:

Example : Let’s say this is our hypothesis:

Displaying full look items on every “continue shopping & view your bag” pop-up and highlighting the value of having a full look will improve the visibility of a full look, encourage visitors to add multiple items from the same look and that will increase the average order value, quantity with cross-selling by 3% .

Who are the participants :

Visitors.

What changes during the testing :

Displaying full look items on every “continue shopping & view your bag” pop-up and highlighting the value of having a full look…

What the effect of the changes will be:

Will improve the visibility of a full look, encourage visitors to add multiple items from the same look and that will increase the average order value, quantity with cross-selling by 3% .

Don’t bite off more than you can chew! Answering some scientific questions can involve more than one experiment, each with its own hypothesis. so, you have to make sure your hypothesis is a specific statement relating to a single experiment.

How a Marketing Experimentation Hypothesis Works

Assuming that you have done conversion research and you have identified a list of issues ( UX or conversion-related problems) and potential revenue opportunities on the site. The next thing you’d want to do is to prioritize the issues and determine which issues will most impact the bottom line.

Having ranked the issues you need to test them to determine which solution works best. At this point, you don’t have a clear solution for the problems identified. So, to get better results and avoid wasting traffic on poor test designs, you need to make sure that your testing plan is guided.

This is where a hypothesis comes into play.

For each and every problem you’re aiming to address, you need to craft a hypothesis for it – unless the problem is a technical issue that can be solved right away without the need to hypothesize or test.

One important thing you should note about an experimentation hypothesis is that it can be implemented in different ways.

This means that one hypothesis can have four or five different tests as illustrated in the image above. Khalid Saleh , the Invesp CEO, explains:

“There are several ways that can be used to support one single hypothesis. Each and every way is a possible test scenario. And that means you also have to prioritize the test design you want to start with. Ultimately the name of the game is you want to find the idea that has the biggest possible impact on the bottom line with the least amount of effort. We use almost 18 different metrics to score all of those.”

In one of the recent tests we launched after watching video recordings, viewing heatmaps, and conducting expert reviews, we noticed that:

Visitors were scrolling to the bottom of the page to fill out a calculator so as to get a free diet plan.
Brand is missing
Too many free diet plans – and this made it hard for visitors to choose and understand.
No value proposition on the page
The copy didn’t mention the benefits of the paid program
There was no clear CTA for the next action

To help you understand, let’s have a look at how the original page looked like before we worked on it:

So our aim was to make the shopping experience seamless for visitors, make the page more appealing and not confusing. In order to do that, here is how we phrased the hypothesis for the page above:

Improving the experience of optin landing pages by making the free offer accessible above the fold and highlighting the next action with a clear CTA and will increase the engagement on the offer and increase the conversion rate by 1%.

For this particular hypothesis, we had two design variations aligned to it:

The two above designs are different, but they are aligned to one hypothesis. This goes on to show how one hypothesis can be implemented in different ways. Looking at the two variations above – which one do you think won?

Yes, you’re right, V2 was the winner.

Considering that there are many ways you can implement one hypothesis, so when you launch a test and it fails, it doesn’t necessarily mean that the hypothesis was wrong. Khalid adds:

“A single failure of a test doesn’t mean that the hypothesis is incorrect. Nine times out of ten it’s because of the way you’ve implemented the hypothesis. Look at the way you’ve coded and look at the copy you’ve used – you are more likely going to find something wrong with it. Always be open.”

So there are three things you should keep in mind when it comes to marketing experimentation hypotheses:

It takes a while for this hypothesis to really fully test it.
A single failure doesn’t necessarily mean that the hypothesis is incorrect.
Whether a hypothesis is proved or disproved, you can still learn something about your users.

I know it’s never easy to develop a hypothesis that informs future testing – I mean it takes a lot of intense research behind the scenes, and tons of ideas to begin with. So, I reached out to six CRO experts for tips and advice to help you understand more about developing a solid hypothesis and what to include in it.

Maurice says that a solid hypothesis should have not more than one goal:

Maurice Beerthuyzen – CRO/CXO Lead at ClickValue “Creating a hypothesis doesn’t begin at the hypothesis itself. It starts with research. What do you notice in your data, customer surveys, and other sources? Do you understand what happens on your website? When you notice an opportunity it is tempting to base one single A/B test on one hypothesis. Create hypothesis A and run a single test, and then move forward to the next test. With another hypothesis. But it is very rare that you solve your problem with only one hypothesis. Often a test provides several other questions. Questions which you can solve with running other tests. But based on that same hypothesis! We should not come up with a new hypothesis for every test. Another mistake that often happens is that we fill the hypothesis with multiple goals. Then we expect that the hypothesis will work on conversion rate, average order value, and/or Click Through Ratio. Of course, this is possible, but when you run your test, your hypothesis can only have one goal at once. And what if you have two goals? Just split the hypothesis then create a secondary hypothesis for your second goal. Every test has one primary goal. What if you find a winner on your secondary hypothesis? Rerun the test with the second hypothesis as the primary one.”

Jon believes that a strong hypothesis is built upon three pillars:

Jon MacDonald – President and Founder of The Good Respond to an established challenge – The challenge must have a strong background based on data, and the background should state an established challenge that the test is looking to address. Example: “Sign up form lacks proof of value, incorrectly assuming if users are on the page, they already want the product.” Propose a specific solution – What is the one, the single thing that is believed will address the stated challenge? Example: “Adding an image of the dashboard as a background to the signup form…”. State the assumed impact – The assumed impact should reference one specific, measurable optimization goal that was established prior to forming a hypothesis. Example: “…will increase signups.” So, if your hypothesis doesn’t have a specific, measurable goal like “will increase signups,” you’re not really stating a test hypothesis!”

Matt uses his own hypothesis builder to collate important data points into a single hypothesis.

Matt Beischel – Founder of Corvus CRO Like Jon, Matt also breaks down his hypothesis writing process into three sections. Unlike Jon, Matt sections are: Comprehension Response Outcome I set it up so that the names neatly match the “CRO.” It’s a sort of “mad-libs” style fill-in-the-blank where each input is an important piece of information for building out a robust hypothesis. I consider these the minimum required data points for a good hypothesis; if you can’t completely fill out the form, then you don’t have a good hypothesis. Here’s a breakdown of each data point: Comprehension – Identifying something that can be improved upon Problem: “What is a problem we have?” Observation Method: “How did we identify the problem?” Response – Change that can cause improvement Variation: “What change do we think could solve the problem?” Location: “Where should the change occur?” Scope: “What are the conditions for the change?” Audience: “Who should the change affect?” Outcome – Measurable result of the change that determines the success Behavior Change : “What change in behavior are we trying to affect?” Primary KPI: “What is the important metric that determines business impact?” Secondary KPIs: “Other metrics that will help reinforce/refute the Primary KPI” Something else to consider is that I have a “user first” approach to formulating hypotheses. My process above is always considered within the context of how it would first benefit the user. Now, I do feel that a successful experiment should satisfy the needs of BOTH users and businesses, but always be in favor of the user. Notice that “Behavior Change” is the first thing listed in Outcome, not primary business KPI. Sure, at the end of the day you are working for the business’s best interests (both strategically and financially), but placing the user first will better inform your decision making and prioritization; there’s a reason that things like personas, user stories, surveys, session replays, reviews, etc. exist after all. A business-first ideology is how you end up with dark patterns and damaging brand credibility.”

One of the many mistakes that CROs make when writing a hypothesis is that they are focused on wins and not on insights. Shiva advises against this mindset:

Shiva Manjunath – Marketing Manager and CRO at Gartner “Test to learn, not test to win. It’s a very simple reframe of hypotheses but can have a magnitude of difference. Here’s an example: Test to Win Hypothesis: If I put a product video in the middle of the product page, I will improve add to cart rates and improve CVR. Test to Learn Hypothesis: If I put a product video on the product page, there will be high engagement with the video and it will positively influence traffic What you’re doing is framing your hypothesis, and test, in a particular way to learn as much as you can. That is where you gain marketing insights. The more you run ‘marketing insight’ tests, the more you will win. Why? The more you compound marketing insight learnings, your win velocity will start to increase as a proxy of the learnings you’ve achieved. Then, you’ll have a higher chance of winning in your tests – and the more you’ll be able to drive business results.”

Lorenzo says it’s okay to focus on achieving a certain result as long as you are also getting an answer to: “Why is this event happening or not happening?”

Lorenzo Carreri – CRO Consultant “When I come up with a hypothesis for a new or iterative experiment, I always try to find an answer to a question. It could be something related to a problem people have or an opportunity to achieve a result or a way to learn something. The main question I want to answer is “Why is this event happening or not happening?” The question is driven by data, both qualitative and quantitative. The structure I use for stating my hypothesis is: From [data source], I noticed [this problem/opportunity] among [this audience of users] on [this page or multiple pages]. So I believe that by [offering this experiment solution], [this KPI] will [increase/decrease/stay the same].

Jakub Linowski says that hypotheses are meant to hold researchers accountable:

Jakub Linowski – Chief Editor of GoodUI “They do this by making your change and prediction more explicit. A typical hypothesis may be expressed as: If we change (X), then it will have some measurable effect (A). Unfortunately, this oversimplified format can also become a heavy burden to your experiment design with its extreme reductionism. However you decide to format your hypotheses, here are three suggestions for more flexibility to avoid limiting yourself. One Or More Changes To break out of the first limitation, we have to admit that our experiments may contain a single or multiple changes. Whereas the classic hypothesis encourages a single change or isolated variable, it’s not the only way we can run experiments. In the real world, it’s quite normal to see multiple design changes inside a single variation. One valid reason for doing this is when wishing to optimize a section of a website while aiming for a greater effect. As more positive changes compound together, there are times when teams decide to run bigger experiments. An experiment design (along with your hypotheses) therefore should allow for both single or multiple changes. One Or More Metrics A second limitation of many hypotheses is that they often ask us to only make a single prediction at a time. There are times when we might like to make multiple guesses or predictions to a set of metrics. A simple example of this might be a trade-off experiment with a guess of increased sales but decreased trial signups. Being able to express single or multiple metrics in our experimental designs should therefore be possible. Estimates, Directional Predictions, Or Unknowns Finally, traditional hypotheses also tend to force very simple directional predictions by asking us to guess whether something will increase or decrease. In reality, however, the fidelity of predictions can be higher or lower. On one hand, I’ve seen and made experiment estimations that contain specific numbers from prior data (ex: increase sales by 14%). While at other times it should also be acceptable to admit the unknown and leave the prediction blank. One example of this is when we are testing a completely novel idea without any prior data in a highly exploratory type of experiment. In such cases, it might be dishonest to make any sort of predictions and we should allow ourselves to express the unknown comfortably.”

Conclusion

So there you have it! Before you jump on launching a test, start by making sure that your hypothesis is solid and backed by research. Ask yourself the questions below when crafting a hypothesis for marketing experimentation:

Is the hypothesis backed by research?
Can the hypothesis be tested?
Does the hypothesis provide insights?
Does the hypothesis set the expectation that there will be an explanation behind the results of whatever you’re testing?

Don’t worry! Hypothesizing may seem like a very complicated process, but it’s not complicated in practice especially when you have done proper research.

If you enjoyed reading this article and you’d love to get the best CRO content – delivered by the best experts in the industry – straight to your inbox, every week. Please subscribe here .

Share This Article

Join 25,000+ marketing professionals.

Subscribe to Invesp’s blog feed for future articles delivered to receive weekly updates by email.

Discover Similar Topics

Above The Fold: Best Practices for Your Website In 2024

Test Categories

How To Craft An E-commerce Growth Marketing Strategy

Sales & Marketing

Our Services

Conversion Optimization Training
Conversion Rate Optimization Professional Services
Landing Page Optimization
Conversion Rate Audit
Design for Growth
Conversion Research & Discovery
End to End Digital Optimization

By Industry

E-commerce CRO Services
Lead Generation CRO Services
SaaS CRO Services
Startup CRO Program
Case Studies
Privacy Policy
© 2006-2020 All rights reserved. Invesp

Subscribe with us

US office: Chicago, IL
European office: Istanbul, Turkey
+1.248.270.3325
[email protected]
Conversion Rate Optimization Services
© 2006-2023 All rights reserved. Invesp
Popular Topics
A/B Testing
Business & Growth
Copywriting
Infographics
Landing Pages

From Hypothesis to Results: Mastering the Art of Marketing Experiments

Max 16 min read

From Hypothesis to Results: Mastering the Art of Marketing Experiments

Click the button to start reading

Suppose you’re trying to convince your friend to watch your favorite movie. You could either tell them about the intriguing plot or show them the exciting trailer.

To find out which approach works best, you try both methods with different friends and see which one gets more people to watch the movie.

Marketing experiments work in much the same way, allowing businesses to test different marketing strategies, gather feedback from their target audience, and make data-driven decisions that lead to improved outcomes and growth.

By testing different approaches and measuring their outcomes, companies can identify what works best for their unique target audience and adapt their marketing strategies accordingly. This leads to more efficient use of marketing resources and results in higher conversion rates, increased customer satisfaction, and, ultimately, business growth.

Marketing experiments are the backbone of building an organization’s culture of learning and curiosity, encouraging employees to think outside the box and challenge the status quo.

In this article, we will delve into the fundamentals of marketing experiments, discussing their key elements and various types. By the end, you’ll be in a position to start running these tests and securing better marketing campaigns with explosive results.

Why Digital Marketing Experiments Matter

One of the most effective ways to drive growth and optimize marketing strategies is through digital marketing experiments. These experiments provide invaluable insights into customer preferences, behaviors, and the overall effectiveness of marketing efforts, making them an essential component of any digital marketing strategy.

Digital marketing experiments matter for several reasons:

Customer-centric approach: By conducting experiments, businesses can gain a deeper understanding of their target audience’s preferences and behaviors. This enables them to tailor their marketing efforts to better align with customer needs, resulting in more effective and engaging campaigns.
Data-driven decision-making: Marketing experiments provide quantitative data on the performance of different marketing strategies and tactics. This empowers businesses to make informed decisions based on actual results rather than relying on intuition or guesswork. Ultimately, this data-driven approach leads to more efficient allocation of resources and improved marketing outcomes.
Agility and adaptability: Businesses must be agile and adaptable to keep up with emerging trends and technologies. Digital marketing experiments allow businesses to test new ideas, platforms, and strategies in a controlled environment, helping them stay ahead of the curve and quickly respond to changing market conditions.
Continuous improvement: Digital marketing experiments facilitate an iterative process of testing, learning, and refining marketing strategies. This ongoing cycle of improvement enables businesses to optimize their marketing efforts, drive better results, and maintain a competitive edge in the digital marketplace.
ROI and profitability: By identifying which marketing tactics are most effective, businesses can allocate their marketing budget more efficiently and maximize their return on investment. This increased profitability can be reinvested into the business, fueling further growth and success.

Developing a culture of experimentation allows businesses to continuously improve their marketing strategies, maximize their ROI, and avoid being left behind by the competition.

The Fundamentals of Digital Marketing Experiments

Marketing experiments are structured tests that compare different marketing strategies, tactics, or assets to determine which one performs better in achieving specific objectives.

These experiments use a scientific approach, which involves formulating hypotheses, controlling variables, gathering data, and analyzing the results to make informed decisions.

Marketing experiments provide valuable insights into customer preferences and behaviors, enabling businesses to optimize their marketing efforts and maximize returns on investment (ROI).

There are several types of marketing experiments that businesses can use, depending on their objectives and available resources.

The most common types include:

A/B testing

A/B testing, also known as split testing, is a simple yet powerful technique that compares two variations of a single variable to determine which one performs better.

In an A/B test, the target audience is randomly divided into two groups: one group is exposed to version A (the control). In contrast, the other group is exposed to version B (the treatment). The performance of both versions is then measured and compared to identify the one that yields better results.

A/B testing can be applied to various marketing elements, such as headlines, calls-to-action, email subject lines, landing page designs, and ad copy. The primary advantage of A/B testing is its simplicity, making it easy for businesses to implement and analyze.

Multivariate testing

Multivariate testing is a more advanced technique that allows businesses to test multiple variables simultaneously.

In a multivariate test, several elements of a marketing asset are modified and combined to create different versions. These versions are then shown to different segments of the target audience, and their performance is measured and compared to determine the most effective combination of variables.

Multivariate testing is beneficial when optimizing complex marketing assets, such as websites or email templates, with multiple elements that may interact with one another. However, this method requires a larger sample size and more advanced analytical tools compared to A/B testing.

Pre-post analysis

Pre-post analysis involves comparing the performance of a marketing strategy before and after implementing a change.

This type of experiment is often used when it is not feasible to conduct an A/B or multivariate test, such as when the change affects the entire customer base or when there are external factors that cannot be controlled.

While pre-post analysis can provide useful insights, it is less reliable than A/B or multivariate testing because it does not account for potential confounding factors. To obtain accurate results from a pre-post analysis, businesses must carefully control for external influences and ensure that the observed changes are indeed due to the implemented modifications.

How To Start Growth Marketing Experiments

To conduct effective marketing experiments, businesses must pay attention to the following key elements:

Clear objectives

Having clear objectives is crucial for a successful marketing experiment. Before starting an experiment, businesses must identify the specific goals they want to achieve, such as increasing conversions, boosting engagement, or improving click-through rates. Clear objectives help guide the experimental design and ensure the results are relevant and actionable.

Hypothesis-driven approach

A marketing experiment should be based on a well-formulated hypothesis that predicts the expected outcome. A reasonable hypothesis is specific, testable, and grounded in existing knowledge or data. It serves as the foundation for experimental design and helps businesses focus on the most relevant variables and outcomes.

Proper experimental design

A marketing experiment requires a well-designed test that controls for potential confounding factors and ensures the reliability and validity of the results. This includes the random assignment of participants, controlling for external influences, and selecting appropriate variables to test. Proper experimental design increases the likelihood that observed differences are due to the tested variables and not other factors.

Adequate sample size

A successful marketing experiment requires an adequate sample size to ensure the results are statistically significant and generalizable to the broader target audience. The required sample size depends on the type of experiment, the expected effect size, and the desired level of confidence. In general, larger sample sizes provide more reliable and accurate results but may also require more resources to conduct the experiment.

Data-driven analysis

Marketing experiments rely on a data-driven analysis of the results. This involves using statistical techniques to determine whether the observed differences between the tested variations are significant and meaningful. Data-driven analysis helps businesses make informed decisions based on empirical evidence rather than intuition or gut feelings.

By understanding the fundamentals of marketing experiments and following best practices, businesses can gain valuable insights into customer preferences and behaviors, ultimately leading to improved outcomes and growth.

Setting up Your First Marketing Experiment

Embarking on your first marketing experiment can be both exciting and challenging. Following a systematic approach, you can set yourself up for success and gain valuable insights to improve your marketing efforts.

Here’s a step-by-step guide to help you set up your first marketing experiment.

Identifying your marketing objectives

Before diving into your experiment, it’s essential to establish clear marketing objectives. These objectives will guide your entire experiment, from hypothesis formulation to data analysis.

Consider what you want to achieve with your marketing efforts, such as increasing website conversions, improving open email rates, or boosting social media engagement.

Make sure your objectives are specific, measurable, achievable, relevant, and time-bound (SMART) to ensure that they are actionable and provide meaningful insights.

Formulating a hypothesis

With your marketing objectives in mind, the next step is formulating a hypothesis for your experiment. A hypothesis is a testable prediction that outlines the expected outcome of your experiment. It should be based on existing knowledge, data, or observations and provide a clear direction for your experimental design.

For example, suppose your objective is to increase email open rates. In that case, your hypothesis might be, “Adding the recipient’s first name to the email subject line will increase the open rate by 10%.” This hypothesis is specific, testable, and clearly linked to your marketing objective.

Designing the experiment

Once you have a hypothesis in place, you can move on to designing your experiment. This involves several key decisions:

Choosing the right testing method:

Select the most appropriate testing method for your experiment based on your objectives, hypothesis, and available resources.

As discussed earlier, common testing methods include A/B, multivariate, and pre-post analyses. Choose the method that best aligns with your goals and allows you to effectively test your hypothesis.

Selecting the variables to test:

Identify the specific variables you will test in your experiment. These should be directly related to your hypothesis and marketing objectives. In the email open rate example, the variable to test would be the subject line, specifically the presence or absence of the recipient’s first name.

When selecting variables, consider their potential impact on your marketing objectives and prioritize those with the greatest potential for improvement. Also, ensure that the variables are easily measurable and can be manipulated in your experiment.

Identifying the target audience:

Determine the target audience for your experiment, considering factors such as demographics, interests, and behaviors. Your target audience should be representative of the larger population you aim to reach with your marketing efforts.

When segmenting your audience for the experiment, ensure that the groups are as similar as possible to minimize potential confounding factors.

In A/B or multivariate testing, this can be achieved through random assignment, which helps control for external influences and ensures a fair comparison between the tested variations.

Executing the experiment

With your experiment designed, it’s time to put it into action.

This involves several key considerations:

Timing and duration:

Choose the right timing and duration for your experiment based on factors such as the marketing channel, target audience, and the nature of the tested variables.

The duration of the experiment should be long enough to gather a sufficient amount of data for meaningful analysis but not so long that it negatively affects your marketing efforts or causes fatigue among your target audience.

In general, aim for a duration that allows you to reach a predetermined sample size or achieve statistical significance. This may vary depending on the specific experiment and the desired level of confidence.

Monitoring the experiment:

During the experiment, monitor its progress and performance regularly to ensure that everything is running smoothly and according to plan. This includes checking for technical issues, tracking key metrics, and watching for any unexpected patterns or trends.

If any issues arise during the experiment, address them promptly to prevent potential biases or inaccuracies in the results. Additionally, avoid making changes to the experimental design or variables during the experiment, as this can compromise the integrity of the results.

Analyzing the results

Once your experiment has concluded, it’s time to analyze the data and draw conclusions.

This involves two key aspects:

Statistical significance:

Statistical significance is a measure of the likelihood that the observed differences between the tested variations are due to the variables being tested rather than random chance. To determine statistical significance, you will need to perform a statistical test, such as a t-test or chi-squared test, depending on the nature of your data.

Generally, a result is considered statistically significant if the probability of the observed difference occurring by chance (the p-value) is less than a predetermined threshold, often set at 0.05 or 5%. This means there is a 95% confidence level that the observed difference is due to the tested variables and not random chance.

Practical significance:

While statistical significance is crucial, it’s also essential to consider the practical significance of your results. This refers to the real-world impact of the observed differences on your marketing objectives and business goals.

To assess practical significance, consider the effect size of the observed difference (e.g., the percentage increase in email open rates) and the potential return on investment (ROI) of implementing the winning variation. This will help you determine whether the experiment results are worth acting upon and inform your marketing decisions moving forward.

A systematic approach to designing growth marketing experiments helps you to design, execute, and analyze your experiment effectively, ultimately leading to better marketing outcomes and business growth.

Examples of Successful Marketing Experiments

In this section, we will explore three fictional case studies of successful marketing experiments that led to improved marketing outcomes. These examples will demonstrate the practical application of marketing experiments across different channels and provide valuable lessons that can be applied to your own marketing efforts.

Example 1: Redesigning a website for increased conversions

AcmeWidgets, an online store selling innovative widgets, noticed that its website conversion rate had plateaued.

They conducted a marketing experiment to test whether a redesigned landing page could improve conversions. They hypothesized that a more visually appealing and user-friendly design would increase conversion rates by 15%.

AcmeWidgets used A/B testing to compare their existing landing page (the control) with a new, redesigned version (the treatment). They randomly assigned website visitors to one of the two landing pages. They tracked conversions over a period of four weeks.

At the end of the experiment, AcmeWidgets found that the redesigned landing page had a conversion rate 18% higher than the control. The results were statistically significant, and the company decided to implement the new design across its entire website.

As a result, AcmeWidgets experienced a substantial increase in sales and revenue.

Example 2: Optimizing email marketing campaigns

EcoTravel, a sustainable travel agency, wanted to improve the open rates of their monthly newsletter. They hypothesized that adding a sense of urgency to the subject line would increase open rates by 10%.

To test this hypothesis, EcoTravel used A/B testing to compare two different subject lines for their newsletter:

“Discover the world’s most beautiful eco-friendly destinations” (control)
“Last chance to book: Explore the world’s most beautiful eco-friendly destinations” (treatment)

EcoTravel sent the newsletter to a random sample of their subscribers. Half received the control subject line, and the other half received the treatment. They then tracked the open rates for both groups over one week.

The results of the experiment showed that the treatment subject line, which included a sense of urgency, led to a 12% increase in open rates compared to the control.

Based on these findings, EcoTravel incorporated a sense of urgency in their future email subject lines to boost newsletter engagement.

Example 3: Improving social media ad performance

FitFuel, a meal delivery service for fitness enthusiasts, was looking to improve its Facebook ad campaign’s click-through rate (CTR). They hypothesized that using an image of a satisfied customer enjoying a FitFuel meal would increase CTR by 8% compared to their current ad featuring a meal image alone.

FitFuel conducted an A/B test on their Facebook ad campaign, comparing the performance of the control ad (meal image only) with the treatment ad (customer enjoying a meal). They targeted a similar audience with both ad variations and measured the CTR over two weeks. The experiment revealed that the treatment ad, featuring the customer enjoying a meal, led to a 10% increase in CTR compared to the control ad. FitFuel decided to update its

Facebook ad campaign with the new image, resulting in a more cost-effective campaign and higher return on investment.

Lessons learned from these examples

These fictional examples of successful marketing experiments highlight several key takeaways:

Clearly defined objectives and hypotheses: In each example, the companies had specific marketing objectives and well-formulated hypotheses, which helped guide their experiments and ensure relevant and actionable results.
Proper experimental design: Each company used the appropriate testing method for their experiment and carefully controlled variables, ensuring accurate and reliable results.
Data-driven decision-making: The companies analyzed the data from their experiments to make informed decisions about implementing changes to their marketing strategies, ultimately leading to improved outcomes.
Continuous improvement: These examples demonstrate that marketing experiments can improve marketing efforts continuously. By regularly conducting experiments and applying the lessons learned, businesses can optimize their marketing strategies and stay ahead of the competition.
Relevance across channels: Marketing experiments can be applied across various marketing channels, such as website design, email campaigns, and social media advertising. Regardless of the channel, the principles of marketing experimentation remain the same, making them a valuable tool for marketers in diverse industries.

By learning from these fictional examples and applying the principles of marketing experimentation to your own marketing efforts, you can unlock valuable insights, optimize your marketing strategies, and achieve better results for your business.

Common Pitfalls of Marketing Experiments and How to Avoid Them

Conducting marketing experiments can be a powerful way to optimize your marketing strategies and drive better results.

However, it’s important to be aware of common pitfalls that can undermine the effectiveness of your experiments. In this section, we will discuss some of these pitfalls and provide tips on how to avoid them.

Insufficient sample size

An insufficient sample size can lead to unreliable results and limit the generalizability of your findings. When your sample size is too small, you run the risk of not detecting meaningful differences between the tested variations or incorrectly attributing the observed differences to random chance.

To avoid this pitfall, calculate the required sample size for your experiment based on factors such as the expected effect size, the desired level of confidence, and the type of statistical test you will use.

In general, larger sample sizes provide more reliable and accurate results but may require more resources to conduct the experiment. Consider adjusting your experimental design or testing methods to accommodate a larger sample size if necessary.

Lack of clear objectives

Your marketing experiment may not provide meaningful or actionable insights without clear objectives. Unclear objectives can lead to poorly designed experiments, irrelevant variables, or difficulty interpreting the results.

To prevent this issue, establish specific, measurable, achievable, relevant, and time-bound (SMART) objectives before starting your experiment. These objectives should guide your entire experiment, from hypothesis formulation to data analysis, and ensure that your findings are relevant and useful for your marketing efforts.

Confirmation bias

Confirmation bias occurs when you interpret the results of your experiment in a way that supports your pre-existing beliefs or expectations. This can lead to inaccurate conclusions and suboptimal marketing decisions.

To minimize confirmation bias, approach your experiments with an open mind and be willing to accept results that challenge your assumptions.

Additionally, involve multiple team members in the data analysis process to ensure diverse perspectives and reduce the risk of individual biases influencing the interpretation of the results.

Overlooking external factors

External factors, such as changes in market conditions, seasonal fluctuations, or competitor actions, can influence the results of your marketing experiment and potentially confound your findings. Ignoring these factors may lead to inaccurate conclusions about the effectiveness of your marketing strategies.

To account for external factors, carefully control for potential confounding variables during the experimental design process. This might involve using random assignment, testing during stable periods, or controlling for known external influences.

Consider running follow-up experiments or analyzing historical data to confirm your findings and rule out the impact of external factors.

Tips for avoiding these pitfalls

By being aware of these common pitfalls and following best practices, you can ensure the success of your marketing experiments and obtain valuable insights for your marketing efforts. Here are some tips to help you avoid these pitfalls:

Plan your experiment carefully: Invest time in the planning stage to establish clear objectives, calculate an adequate sample size, and design a robust experiment that controls for potential confounding factors.
Use a hypothesis-driven approach: Formulate a specific, testable hypothesis based on existing knowledge or data to guide your experiment and focus on the most relevant variables and outcomes.
Monitor your experiment closely: Regularly check the progress of your experiment, address any issues that arise, and ensure that your experiment is running smoothly and according to plan.
Analyze your data objectively: Use statistical techniques to determine the significance of your results and consider the practical implications of your findings before making marketing decisions.
Learn from your experiments: Apply the lessons learned from your experiments to continuously improve your marketing strategies and stay ahead of the competition.

By avoiding these common pitfalls and following best practices, you can increase the effectiveness of your marketing experiments, gain valuable insights into customer preferences and behaviors, and ultimately drive better results for your business.

Building a Culture of Experimentation

To truly reap the benefits of marketing experiments, it’s essential to build a culture of experimentation within your organization. This means fostering an environment where curiosity, learning, data-driven decision-making, and collaboration are valued and encouraged.

Encouraging curiosity and learning within your organization

Cultivating curiosity and learning starts with leadership. Encourage your team to ask questions, explore new ideas, and embrace a growth mindset.

Promote ongoing learning by providing resources, such as training programs, workshops, or access to industry events, that help your team stay up-to-date with the latest marketing trends and techniques.

Create a safe environment where employees feel comfortable sharing their ideas and taking calculated risks. Emphasize the importance of learning from both successes and failures and treat every experiment as an opportunity to grow and improve.

Adopting a data-driven mindset

A data-driven mindset is crucial for successful marketing experimentation. Encourage your team to make decisions based on data rather than relying on intuition or guesswork. This means analyzing the results of your experiments objectively, using statistical techniques to determine the significance of your findings, and considering the practical implications of your results before making marketing decisions.

To foster a data-driven culture, invest in the necessary tools and technologies to collect, analyze, and visualize data effectively. Train your team on how to use these tools and interpret the data to make informed marketing decisions.

Regularly review your data-driven efforts and adjust your strategies as needed to continuously improve and optimize your marketing efforts.

Integrating experimentation into your marketing strategy

Establish a systematic approach to conducting marketing experiments to fully integrate experimentation into your marketing strategy. This might involve setting up a dedicated team or working group responsible for planning, executing, and analyzing experiments or incorporating experimentation as a standard part of your marketing processes.

Create a roadmap for your marketing experiments that outlines each project’s objectives, hypotheses, and experimental designs. Monitor the progress of your experiments and adjust your roadmap as needed based on the results and lessons learned.

Ensure that your marketing team has the necessary resources, such as time, budget, and tools, to conduct experiments effectively. Set clear expectations for the role of experimentation in your marketing efforts and emphasize its importance in driving better results and continuous improvement.

Collaborating across teams for a holistic approach

Marketing experiments often involve multiple teams within an organization, such as design, product, sales, and customer support. Encourage cross-functional collaboration to ensure a holistic approach to experimentation and leverage each team’s unique insights and expertise.

Establish clear communication channels and processes for sharing information and results from your experiments. This might involve regular meetings, shared documentation, or internal presentations to keep all stakeholders informed and engaged.

Collaboration also extends beyond your organization. Connect with other marketing professionals, industry experts, and thought leaders to learn from their experiences, share your own insights, and stay informed about the latest trends and best practices in marketing experimentation.

By building a culture of experimentation within your organization, you can unlock valuable insights, optimize your marketing strategies, and drive better results for your business.

Encourage curiosity and learning, adopt a data-driven mindset, integrate experimentation into your marketing strategy, and collaborate across teams to create a strong foundation for marketing success.

If you’re new to marketing experiments, don’t be intimidated—start small and gradually expand your efforts as your confidence grows. By embracing a curious and data-driven mindset, even small-scale experiments can lead to meaningful insights and improvements.

As you gain experience, you can tackle more complex experiments and further refine your marketing strategies.

Remember, continuous learning and improvement is the key to success in marketing experimentation. By regularly conducting experiments, analyzing the results, and applying the lessons learned, you can stay ahead of the competition and drive better results for your business.

So, take the plunge and start experimenting today—your marketing efforts will be all the better.

#ezw_tco-2 .ez-toc-title{ font-size: 120%; ; ; } #ezw_tco-2 .ez-toc-widget-container ul.ez-toc-list li.active{ background-color: #ededed; } Table of Contents

Manage your remote team with teamly. get your 100% free account today..

PC and Mac compatible

Teamly is everywhere you need it to be. Desktop download or web browser or IOS/Android app. Take your pick.

Get Teamly for FREE by clicking below.

No credit card required. completely free.

Teamly puts everything in one place, so you can start and finish projects quickly and efficiently.

Keep reading.

Project Management

Coloring Inside the Lines: Scope Management in Project Planning

Coloring Inside the Lines: Scope Management in Project PlanningAs too many project managers have learned, the “just” request is the kiss of death. “Can you just do me a quick favor?” “Can you just make a small change here?” Because no request is ever “just.” You agree to change the font on a webpage or …

Continue reading “Coloring Inside the Lines: Scope Management in Project Planning”

Max 9 min read

The Art of Team Management: A Guide to Leading Effectively

The Art of Team Management: A Guide to Leading EffectivelyLeading a team comes in all shapes and sizes, whether captaining a football team, coordinating group assignments at college, or supervising the staff at a coffee shop. At the office, managing a team effectively means getting the best out of your staff and supporting them in …

Continue reading “The Art of Team Management: A Guide to Leading Effectively”

Max 10 min read

The 2023 Guide to Planning Tools for Project Management

The 2023 Guide to Planning Tools for Project ManagementThe complexity of modern projects requires more than just a to-do list. Without the right tools, project management can become a maze with no clear path to success. The confusion grows, tasks overlap, deadlines are missed, and the entire project can stall. The need for clarity and …

Continue reading “The 2023 Guide to Planning Tools for Project Management”

Project Management Software Comparisons

Asana vs Wrike

Basecamp vs Slack

Smartsheet vs Airtable

Trello vs ClickUp

Monday.com vs Jira Work Management

Trello vs asana.

Get Teamly for FREE Enter your email and create your account today!

You must enter a valid email address

You must enter a valid email address!

BUS602: Marketing Management

Case Study: Role of Marketing Mix

This research article looks at tourism of Lake Samosir in North Sumatra, Indonesia. The underlying question is whether the implementation of the marketing mix influenced tourism in the area.

4.3. Hypothesis Testing

As explained in the previous chapter, there are 5 hypotheses in this study. Hypothesis testing analysis is carried out with a significance level of 5%, resulting in a critical t-value of ± 1.96. The hypothesis is accepted if the t-value obtained ≥ 1.96, while hypothesis is not supported if the t-value obtained < 1.96. The following is a table of hypothesis testing to answer the overall questions of the study:

Table 2: Hypothesis Testing of Research Model Based on table 2 above which contains the conclusion of the hypothesis model results, it can be concluded as follows:

a) Marketing Mix has a positive effect on Tourist Satisfactions

Based on data processing results of the structural model, the output of t-value is 3.78. The result of t-value shown is greater than 1.96, so it can be concluded that the variable of marketing mix has a significant positive effect on tourist satisfactions. Thus, hypothesis 1 can be accepted and it can be concluded that the higher marketing mix perceived, the higher tourist satisfactions will be.

b) Service Quality has a positive effect on Tourist Satisfactions

Based on data processing results of the structural model, the output of t-value is 5.94. The result of t-value shown is greater than 1.96, so it can be concluded that the variable of service quality has a significant positive effect on tourist satisfactions. Thus, it can be concluded that the higher service quality perceived, the higher tourist satisfactions will be.

c) Marketing Mix has a positive effect on Tourists Loyalty

Based on data processing results of the structural model, the output of t-value is 4.19. The result of t-value shown is greater than 1.96, so it can be concluded that the variable of marketing mix has a significant positive effect on tourists loyalty. Thus, it can be concluded that the higher marketing mix perceived, the higher tourist loyalty will be.

d) Service Quality has a positive effect on Tourists Loyalty

Based on data processing results of the structural model, the output of t-value is 3.23. The result of t-value shown is greater than 1.96, so it can be concluded that the variable of service quality has a significant positive effect on tourists loyalty. Thus, it can be concluded that the higher the perceived service quality, the higher tourist loyalty will be.

e) Satisfactions has a positive effect on Tourists Loyalty

Based on data processing results of the structural model, the output of t-value is 3.16. The result of t-value shown is greater than 1.96, so it can be concluded that the variable of satisfactions has a significant positive effect on tourists loyalty. Thus, it can be concluded that with higher perceived satisfaction comes higher tourist loyalty.

Hypothesis Testing Of Mediation (Indirect Effects)

As explained in the previous chapter, in this study there are two moderation hypotheses by Tourist Satisfaction variables. Hypothesis testing analysis is carried out with a significance level of 5%, resulting in a critical t-value of ± 1.96. The hypothesis is accepted if the t-value obtained ≥ 1.96, while hypothesis is not supported if the t-value obtained < 1.96.

The following is a table of testing hypotheses to answer indirect influences.

Table 3. Testing of Indirect Influence Hypotheses

Based on the results of the LISREL output above, the data from the structural model, obtained the output of t-value (line 3), in the result showed that the variables of tourists satisfaction can mediate the effect between the variable of marketing mix and service quality that has an indirect effect on tourists loyalty. This can be seen from t-count value is greater than 1.96 i.e. 2.50 and 3.00.

Subscribers
How To Use a New AI App and AI Agents To Build Your Best Landing Page
The MECLABS AI Guild in Action: Teamwork in Crafting Their Optimal Landing Page
How MECLABS AI Is Being Used To Build the AI Guild
MECLABS AI’s Problem Solver in Action
MECLABS AI: Harness AI With the Power of Your Voice
Harnessing MECLABS AI: Transform Your Copywriting and Landing Pages
MECLABS AI: Overcome the ‘Almost Trap’ and Get Real Answers
MECLABS AI: A brief glimpse into what is coming!
Transforming Marketing with MECLABS AI: A New Paradigm
Creative AI Marketing: Escaping the ‘Vending Machine Mentality’

A/B Testing: Example of a good hypothesis

Want to know the secret to always running successful tests?

The answer is to formulate a hypothesis .

Now when I say it’s always successful, I’m not talking about always increasing your Key Performance Indicator (KPI). You can “lose” a test, but still be successful.

That sounds like an oxymoron, but it’s not. If you set up your test strategically, even if the test decreases your KPI, you gain a learning , which is a success! And, if you win, you simultaneously achieve a lift and a learning. Double win!

The way you ensure you have a strategic test that will produce a learning is by centering it around a strong hypothesis.

So, what is a hypothesis?

By definition, a hypothesis is a proposed statement made on the basis of limited evidence that can be proved or disproved and is used as a starting point for further investigation.

Let’s break that down:

It is a proposed statement.

A hypothesis is not fact, and should not be argued as right or wrong until it is tested and proven one way or the other.

It is made on the basis of limited (but hopefully some ) evidence.

Your hypothesis should be informed by as much knowledge as you have. This should include data that you have gathered, any research you have done, and the analysis of the current problems you have performed.

It can be proved or disproved.

A hypothesis pretty much says, “I think by making this change , it will cause this effect .” So, based on your results, you should be able to say “this is true” or “this is false.”

It is used as a starting point for further investigation.

The key word here is starting point . Your hypothesis should be formed and agreed upon before you make any wireframes or designs as it is what guides the design of your test. It helps you focus on what elements to change, how to change them, and which to leave alone.

How do I write a hypothesis?

The structure of your basic hypothesis follows a CHANGE: EFFECT framework.

While this is a truly scientific and testable template, it is very open-ended. Even though this hypothesis, “Changing an English headline into a Spanish headline will increase clickthrough rate,” is perfectly valid and testable, if your visitors are English-speaking, it probably doesn’t make much sense.

So now the question is …

How do I write a GOOD hypothesis?

To quote my boss Tony Doty , “This isn’t Mad Libs.”

We can’t just start plugging in nouns and verbs and conclude that we have a good hypothesis. Your hypothesis needs to be backed by a strategy. And, your strategy needs to be rooted in a solution to a problem .

So, a more complete version of the above template would be something like this:

In order to have a good hypothesis, you don’t necessarily have to follow this exact sentence structure, as long as it is centered around three main things:

Presumed problem

Proposed solution

Anticipated result

After you’ve completed your analysis and research, identify the problem that you will address. While we need to be very clear about what we think the problem is, you should leave it out of the hypothesis since it is harder to prove or disprove. You may want to come up with both a problem statement and a hypothesis .

For example:

Problem Statement: “The lead generation form is too long, causing unnecessary friction .”

Hypothesis: “By changing the amount of form fields from 20 to 10, we will increase number of leads.”

When you are thinking about the solution you want to implement, you need to think about the psychology of the customer. What psychological impact is your proposed problem causing in the mind of the customer?

For example, if your proposed problem is “There is a lack of clarity in the sign-up process,” the psychological impact may be that the user is confused.

Now think about what solution is going to address the problem in the customer’s mind. If they are confused, we need to explain something better, or provide them with more information. For this example, we will say our proposed solution is to “Add a progress bar to the sign-up process.” This leads straight into the anticipated result.

If we reduce the confusion in the visitor’s mind (psychological impact) by adding the progress bar, what do we foresee to be the result? We are anticipating that it would be more people completing the sign-up process. Your proposed solution and your KPI need to be directly correlated.

Note: Some people will include the psychological impact in their hypothesis. This isn’t necessarily wrong, but we do have to be careful with assumptions. If we say that the effect will be “Reduced confusion and therefore increase in conversion rate,” we are assuming the reduced confusion is what made the impact. While this may be correct, it is not measureable and it is hard to prove or disprove.

To summarize, your hypothesis should follow a structure of: “If I change this, it will have this effect,” but should always be informed by an analysis of the problems and rooted in the solution you deemed appropriate.

Related Resources:

A/B Testing 101: How to get real results from optimization

The True Value of Data

15 Years of Marketing Research in 11 Minutes

Marketing Analytics: 6 simple steps for interpreting your data

Website A/B Testing: 4 tips to beat an unbeatable landing page

Online Cart: 6 ideas to test and optimize your checkout process

B2B Gamification: Autodesk’s two approaches to in-trial marketing [Video]

How to Discover Exactly What the Customer Wants to See on the Next Click: 3 critical…

The 21 Psychological Elements that Power Effective Web Design (Part 3)

The 21 Psychological Elements that Power Effective Web Design (Part 2)

The 21 Psychological Elements that Power Effective Web Design (Part 1)

Thanks for the article. I’ve been trying to wrap my head around this type of testing because I’d like to use it to see the effectiveness on some ads. This article really helped. Thanks Again!

Hey Lauren, I am just getting to the point that I have something to perform A-B testing on. This post led me to this site which will and already has become a help in what to test and how to test .

Again, thanks for getting me here .

Good article. I have been researching different approaches to writing testing hypotheses and this has been a help. The only thing I would add is that it can be useful to capture the insight/justification within the hypothesis statement. IF i do this, THEN I expect this result BECAUSE I have this insight.

@Kaya Great!

Good article – but technically you can never prove an hypothesis, according to the principle of falsification (Popper), only fail to disprove the null hypothesis.

Hypothesis Testing in Business Analytics – A Beginner’s Guide

Introduction

Organizations must understand how their decisions can impact the business in this data-driven age. Hypothesis testing enables organizations to analyze and examine their decisions’ causes and effects before making important management decisions. Based on research by the Harvard Business School Online, prior to making any decision, organizations like to explore the advantages of hypothesis testing and the investigation of decisions in a proper “laboratory” setting. By performing such tests, organizations can be more confident with their decisions. Read on to learn all about hypothesis testing , o ne of the essential concepts in Business Analytics.

What Is Hypothesis Testing?

To learn about hypothesis testing, it is crucial that you first understand what the term hypothesis is.

A hypothesis statement or hypothesis tries to explain why something happened or what may happen under specific conditions. A hypothesis can also help understand how various variables are connected to each other. These are generally compiled as if-then statements; for example, “If something specific were to happen, then a specific condition will come true and vice versa.” Thus, the hypothesis is an arithmetical method of testing a hypothesis or an assumption that has been stated in the hypothesis.

Turning into a decision-maker who is driven by data can add several advantages to an organization, such as allowing one to recognize new opportunities to follow and reducing the number of threats. In analytics, a hypothesis is nothing but an assumption or a supposition made about a specific population parameter, such as any measurement or quantity about the population that is set and that can be used as a value to the distribution variable. General examples of parameters used in hypothesis testing are variance and mean. In simpler words, hypothesis testing in business analytics is a method that helps researchers, scientists, or anyone for that matter, test the legitimacy or the authenticity of their hypotheses or claims about real-life or real-world events.

To understand the example of hypothesis testing in business analytics, consider a restaurant owner interested in learning how adding extra house sauce to their chicken burgers can impact customer satisfaction. Or, you could also consider a social media marketing organization. A hypothesis test can be set up to explain how an increase in labor impacts productivity. Thus, hypothesis testing aims to discover the connection between two or more than two variables in the experimental setting.

How Does Hypothesis Testing Work?

Generally, each research begins with a hypothesis; the investigator makes a certain claim and experiments to prove that the claim is false or true. For example, if you claim that students drinking milk before class accomplish tasks better than those who do not, then this is a kind of hypothesis that can be refuted or confirmed using an experiment. There are different kinds of hypotheses. They are:

Simple Hypothesis : Simple hypothesis, also known as a basic hypothesis, proposes that an independent variable is accountable for the corresponding dependent variable. In simpler words, the occurrence of independent variable results in the existence of the dependent variable. Generally, simple hypotheses are thought of as true and they create a causal relationship between the two variables. One example of a simple hypothesis is smoking cigarettes daily leads to cancer.
Complex Hypothesis : This type of hypothesis is also termed a modal. It holds for the relationship between two variables that are independent and result in a dependent variable. This means that the amalgamation of independent variables results in the dependent variables. An example of this kind of hypothesis can be “adults who don’t drink and smoke are less likely to have liver-related problems.
Null Hypothesis : A null hypothesis is created when a researcher thinks that there is no connection between the variables that are being observed. An example of this kind of hypothesis can be “A student’s performance is not impacted if they drink tea or coffee before classes.
Alternative Hypothesis : If a researcher wants to disapprove of a null hypothesis, then the researcher has to develop an opposite assumption—known as an alternative hypothesis. For example, beginning your day with tea instead of coffee can keep you more alert.
Logical Hypothesis: A proposed explanation supported by scant data is called a logical hypothesis. Generally, you wish to test your hypotheses or postulations by converting a logical hypothesis into an empirical hypothesis. For example, waking early helps one to have a productive day.
Empirical Hypothesis : This type of hypothesis is based on real evidence, evidence that is verifiable by observation as opposed to something that is correct in theory or by some kind of reckoning or logic. This kind of hypothesis depends on various variables that can result in specific outcomes. For example, individuals eating more fish can run faster than those eating meat.
Statistical Hypothesis : This kind of hypothesis is most common in systematic investigations that involve a huge target audience. For example, in Louisiana, 45% of students have middle-income parents.

Four Steps of Hypothesis Testing

There are four main steps in hypothesis testing in business analytics :

Step 1: State the Null and Alternate Hypothesis

After the initial research hypothesis, it is essential to restate it as a null (Ho) hypothesis and an alternate (Ha) hypothesis so that it can be tested mathematically.

Step 2: Collate Data

For a test to be valid, it is essential to do some sampling and collate data in a manner designed to test the hypothesis. If your data are not representative, then statistical inferences cannot be made about the population you are trying to analyze.

Step 3: Perform a Statistical Test

Various statistical tests are present, but all of them depend on the contrast of within-group variance (how to spread out the data in a group) against between-group variance (how dissimilar the groups are from one another).

Step 4: Decide to Reject or Accept Your Null Hypothesis

Based on the result of your statistical test, you need to decide whether you want to accept or reject your null hypothesis.

Hypothesis Testing in Business

When we talk about data-driven decision-making, a specific amount of risk can deceive a professional. This could result from flawed observations or thinking inaccurate or incomplete information , or unknown variables. The threat over here is that if key strategic decisions are made on incorrect insights, it can lead to catastrophic outcomes for an organization. The actual importance of hypothesis testing is that it enables professionals to analyze their assumptions and theories before putting them into action. This enables an organization to confirm the accuracy of its analysis before making key decisions.

Key Considerations for Hypothesis Testing

Let us look at the following key considerations of hypothesis testing:

Alternative Hypothesis and Null Hypothesis : If a researcher wants to disapprove of a null hypothesis, then the researcher has to develop an opposite assumption—known as an alternative hypothesis. A null hypothesis is created when a researcher thinks that there is no connection between the variables that are being observed.
Significance Level and P-Value : The statistical significance level is generally expressed as a p-value that lies between 0 and 1. The lesser the p-value, the more it suggests that you reject the null hypothesis. A p-value of less than 0.05 (generally ≤ 0.05) is significant statistically.
One-Sided vs. Two-Sided Testing : One-sided tests suggest the possibility of an effect in a single direction only. Two-sided tests test for the likelihood of the effect in two directions—negative and positive. One-sided tests comprise more statistical power to identify an effect in a single direction than a two-sided test with the same significance level and design.
Sampling: For hypothesis testing , you are required to collate a sample of data that has to be examined. In hypothesis testing, an analyst can test a statistical sample with the aim of providing proof of the credibility of the null hypothesis. Statistical analysts can test a hypothesis by examining and measuring a random sample of the population that is being examined.

Real-World Example of Hypothesis Testing

The following two examples give a glimpse of the various situations in which hypothesis testing is used in real-world scenarios.

Example: BioSciences

Hypothesis tests are frequently used in biological sciences. For example, consider that a biologist is sure that a certain kind of fertilizer will lead to better growth of plants which is at present 10 inches. To test this, the fertilizer is sprayed on the plants in the laboratory for a month. A hypothesis test is then done using the following:

H0: μ = 10 inches (the fertilizer has no effect on the plant growth)
HA: μ > 10 inches (the fertilizer leads to an increase in plant growth)

Suppose the p-value is lesser than the significance level (e.g., α = .04). In that case, the null hypothesis can be rejected, and it can be concluded that the fertilizer results in increased plant growth.

Example: Clinical Trials

Consider an example where a doctor feels that a new medicine can decrease blood sugar in patients. To confirm this, he can measure the sugar of 20 diabetic patients prior to and after administering the new drug for a month. A hypothesis test is then done using the following:

H0: μafter = μbefore (the blood sugar is the same as before and after administering the new drug)
HA: μafter < μbefore (the blood sugar is less after the drug)

If the p-value is less than the significance level (e.g., α = .04), then the null hypothesis can be rejected, and it can be proven that the new drug leads to reduced blood sugar.

Conclusion

Now you are aware of the need for hypotheses in Business Analytics . A hypothesis is not just an assumption— it has to be based on prior knowledge and theories. It also needs to be, which means that you can accept or reject it using scientific research methods (such as observations, experiments, and statistical data analysis). Most genuine Hypothesis testing programs teach you how to use hypothesis testing in real-world scenarios. If you are interested in getting a certificate degree in Integrated Program In Business Analytics , UNext Jigsaw is highly recommended.

Fill in the details to know more

Are you ready to build your own career?

Query? Ask Us

Enter Your Details ×

9.4 Full Hypothesis Test Examples

Tests on means, example 9.8.

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds for swimming the 25-yard freestyle, with a standard deviation of 0.8 seconds . His dad, Frank, thought that Jeffrey could swim the 25-yard freestyle faster using goggles. Frank bought Jeffrey a new pair of expensive goggles and timed Jeffrey for 15 25-yard freestyle swims . For the 15 swims, Jeffrey's mean time was 16 seconds. Frank thought that the goggles helped Jeffrey to swim faster than the 16.43 seconds. Conduct a hypothesis test using a preset α = 0.05. Assume that the swim times for the 25-yard freestyle are normal.

Set up the Hypothesis Test:

Since the problem is about a mean, this is a test of a single population mean .

H 0 : μ = 16.43 H a : μ < 16.43

For Jeffrey to swim faster, his time will be less than 16.43 seconds. The "<" tells you this is left-tailed.

Determine the distribution needed:

Random variable: X ¯ X ¯ = the mean time to swim the 25-yard freestyle.

Distribution for the test: X ¯ X ¯ is normal (population standard deviation is known: σ = 0.8)

X ¯ ~ N ( μ , σ X n ) X ¯ ~ N ( μ , σ X n ) Therefore, X ¯ ~ N ( 16.43 , 0.8 15 ) X ¯ ~ N ( 16.43 , 0.8 15 )

μ = 16.43 comes from H 0 and not the data. σ = 0.8, and n = 15.

Calculate the p -value using the normal distribution for a mean:

p -value = P ( x ¯ x ¯ < 16) = 0.0187 where the sample mean in the problem is given as 16.

p -value = 0.0187 (This is called the actual level of significance .) The p -value is the area to the left of the sample mean is given as 16.

μ = 16.43 comes from H 0 . Our assumption is μ = 16.43.

Interpretation of the p -value: If H 0 is true , there is a 0.0187 probability (1.87%)that Jeffrey's mean time to swim the 25-yard freestyle is 16 seconds or less. Because a 1.87% chance is small, the mean time of 16 seconds or less is unlikely to have happened randomly. It is a rare event.

Compare α and the p -value:

α = 0.05 p -value = 0.0187 α > p -value

Make a decision: Since α > α > p -value, reject H 0 .

This indicates that you reject the null hypothesis that the mean time to swim the 25-yard freestyle is at least 16.43 seconds.

Conclusion: At the 5% significance level, there is sufficient evidence that Jeffrey's mean time to swim the 25-yard freestyle is less than 16.43 seconds. Thus, based on the sample data, we conclude that Jeffrey swims faster using the new goggles.

The Type I and Type II errors for this problem are as follows: The Type I error is to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually swims the 25-yard freestyle, on average, in at least 16.43 seconds. (Reject the null hypothesis when the null hypothesis is true.)

The Type II error is that there is not evidence to conclude that Jeffrey swims the 25-yard freestyle, on average, in less than 16.43 seconds when, in fact, he actually does swim the 25-yard free-style, on average, in less than 16.43 seconds. (Do not reject the null hypothesis when the null hypothesis is false.)

The mean throwing distance of a football for Marco, a high school quarterback, is 40 yards, with a standard deviation of two yards. The team coach tells Marco to adjust his grip to get more distance. The coach records the distances for 20 throws. For the 20 throws, Marco’s mean distance was 45 yards. The coach thought the different grip helped Marco throw farther than 40 yards. Conduct a hypothesis test using a preset α = 0.05. Assume the throw distances for footballs are normal.

First, determine what type of test this is, set up the hypothesis test, find the p -value, sketch the graph, and state your conclusion.

Example 9.9

Jasmine has just begun her new job on the sales force of a very competitive company. In a sample of 16 sales calls it was found that she closed the contract for an average value of 108 dollars with a standard deviation of 12 dollars. Test at 5% significance that the population mean is at least 100 dollars against the alternative that it is less than 100 dollars. Company policy requires that new members of the sales force must exceed an average of $100 per contract during the trial employment period. Can we conclude that Jasmine has met this requirement at the significance level of 95%?

H 0 : µ ≤ 100 H a : µ > 100 The null and alternative hypothesis are for the parameter µ because the number of dollars of the contracts is a continuous random variable. Also, this is a one-tailed test because the company has only an interested if the number of dollars per contact is below a particular number not "too high" a number. This can be thought of as making a claim that the requirement is being met and thus the claim is in the alternative hypothesis.
Test statistic: t c = x ¯ − µ 0 s n = 108 − 100 ( 12 16 ) = 2.67 t c = x ¯ − µ 0 s n = 108 − 100 ( 12 16 ) = 2.67
Critical value: t a = 1.753 t a = 1.753 with n-1 degrees of freedom= 15

The test statistic is a Student's t because the sample size is below 30; therefore, we cannot use the normal distribution. Comparing the calculated value of the test statistic and the critical value of t t ( t a ) ( t a ) at a 5% significance level, we see that the calculated value is in the tail of the distribution. Thus, we conclude that 108 dollars per contract is significantly larger than the hypothesized value of 100 and thus we cannot accept the null hypothesis. There is evidence that supports Jasmine's performance meets company standards.

It is believed that a stock price for a particular company will grow at a rate of $5 per week with a standard deviation of $1. An investor believes the stock won’t grow as quickly. The changes in stock price is recorded for ten weeks and are as follows: $4, $3, $2, $3, $1, $7, $2, $1, $1, $2. Perform a hypothesis test using a 5% level of significance. State the null and alternative hypotheses, state your conclusion, and identify the Type I errors.

Example 9.10

A manufacturer of salad dressings uses machines to dispense liquid ingredients into bottles that move along a filling line. The machine that dispenses salad dressings is working properly when 8 ounces are dispensed. Suppose that the average amount dispensed in a particular sample of 35 bottles is 7.91 ounces with a variance of 0.03 ounces squared, s 2 s 2 . Is there evidence that the machine should be stopped and production wait for repairs? The lost production from a shutdown is potentially so great that management feels that the level of significance in the analysis should be 99%.

Again we will follow the steps in our analysis of this problem.

STEP 1 : Set the Null and Alternative Hypothesis. The random variable is the quantity of fluid placed in the bottles. This is a continuous random variable and the parameter we are interested in is the mean. Our hypothesis therefore is about the mean. In this case we are concerned that the machine is not filling properly. From what we are told it does not matter if the machine is over-filling or under-filling, both seem to be an equally bad error. This tells us that this is a two-tailed test: if the machine is malfunctioning it will be shutdown regardless if it is from over-filling or under-filling. The null and alternative hypotheses are thus:

STEP 2 : Decide the level of significance and draw the graph showing the critical value.

This problem has already set the level of significance at 99%. The decision seems an appropriate one and shows the thought process when setting the significance level. Management wants to be very certain, as certain as probability will allow, that they are not shutting down a machine that is not in need of repair. To draw the distribution and the critical value, we need to know which distribution to use. Because this is a continuous random variable and we are interested in the mean, and the sample size is greater than 30, the appropriate distribution is the normal distribution and the relevant critical value is 2.575 from the normal table or the t-table at 0.005 column and infinite degrees of freedom. We draw the graph and mark these points.

STEP 3 : Calculate sample parameters and the test statistic. The sample parameters are provided, the sample mean is 7.91 and the sample variance is .03 and the sample size is 35. We need to note that the sample variance was provided not the sample standard deviation, which is what we need for the formula. Remembering that the standard deviation is simply the square root of the variance, we therefore know the sample standard deviation, s, is 0.173. With this information we calculate the test statistic as -3.07, and mark it on the graph.

STEP 4 : Compare test statistic and the critical values Now we compare the test statistic and the critical value by placing the test statistic on the graph. We see that the test statistic is in the tail, decidedly greater than the critical value of 2.575. We note that even the very small difference between the hypothesized value and the sample value is still a large number of standard deviations. The sample mean is only 0.08 ounces different from the required level of 8 ounces, but it is 3 plus standard deviations away and thus we cannot accept the null hypothesis.

STEP 5 : Reach a Conclusion

Three standard deviations of a test statistic will guarantee that the test will fail. The probability that anything is within three standard deviations is almost zero. Actually it is 0.0026 on the normal distribution, which is certainly almost zero in a practical sense. Our formal conclusion would be “ At a 99% level of significance we cannot accept the hypothesis that the sample mean came from a distribution with a mean of 8 ounces” Or less formally, and getting to the point, “At a 99% level of significance we conclude that the machine is under filling the bottles and is in need of repair”.

Try It 9.10

A company records the mean time of employees working in a day. The mean comes out to be 475 minutes, with a standard deviation of 45 minutes. A manager recorded times of 20 employees. The times of working were (frequencies are in parentheses) 460(3); 465(2); 470(3); 475(1); 480(6); 485(3); 490(2).

Conduct a hypothesis test using a 2.5% level of significance to determine if the mean time is more than 475 .

Hypothesis Test for Proportions

Just as there were confidence intervals for proportions, or more formally, the population parameter p of the binomial distribution, there is the ability to test hypotheses concerning p .

The population parameter for the binomial is p . The estimated value (point estimate) for p is p′ where p′ = x/n , x is the number of successes in the sample and n is the sample size.

When you perform a hypothesis test of a population proportion p , you take a simple random sample from the population. The conditions for a binomial distribution must be met, which are: there are a certain number n of independent trials meaning random sampling, the outcomes of any trial are binary, success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np′ and nq′ must both be greater than five ( np′ > 5 and nq′ > 5). In this case the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = np μ = np and σ = npq σ = npq . Remember that q = 1 – p q = 1 – p . There is no distribution that can correct for this small sample bias and thus if these conditions are not met we simply cannot test the hypothesis with the data available at that time. We met this condition when we first were estimating confidence intervals for p .

Again, we begin with the standardizing formula modified because this is the distribution of a binomial.

Substituting p 0 p 0 , the hypothesized value of p , we have:

This is the test statistic for testing hypothesized values of p , where the null and alternative hypotheses take one of the following forms:

The decision rule stated above applies here also: if the calculated value of Z c shows that the sample proportion is "too many" standard deviations from the hypothesized proportion, the null hypothesis cannot be accepted. The decision as to what is "too many" is pre-determined by the analyst depending on the level of significance required in the test.

Example 9.11

The mortgage department of a large bank is interested in the nature of loans of first-time borrowers. This information will be used to tailor their marketing strategy. They believe that 50% of first-time borrowers take out smaller loans than other borrowers. They perform a hypothesis test to determine if the percentage is the same or different from 50% . They sample 100 first-time borrowers and find 53 of these loans are smaller that the other borrowers. For the hypothesis test, they choose a 5% level of significance.

STEP 1 : Set the null and alternative hypothesis.

H 0 : p = 0.50 H a : p ≠ 0.50

The words "is the same or different from" tell you this is a two-tailed test. The Type I and Type II errors are as follows: The Type I error is to conclude that the proportion of borrowers is different from 50% when, in fact, the proportion is actually 50%. (Reject the null hypothesis when the null hypothesis is true). The Type II error is there is not enough evidence to conclude that the proportion of first time borrowers differs from 50% when, in fact, the proportion does differ from 50%. (You fail to reject the null hypothesis when the null hypothesis is false.)

STEP 2 : Decide the level of significance and draw the graph showing the critical value

The level of significance has been set by the problem at the 5% level. Because this is two-tailed test one-half of the alpha value will be in the upper tail and one-half in the lower tail as shown on the graph. The critical value for the normal distribution at the 95% level of confidence is 1.96. This can easily be found on the student’s t-table at the very bottom at infinite degrees of freedom remembering that at infinity the t-distribution is the normal distribution. Of course the value can also be found on the normal table but you have go looking for one-half of 95 (0.475) inside the body of the table and then read out to the sides and top for the number of standard deviations.

STEP 3 : Calculate the sample parameters and critical value of the test statistic.

The test statistic is a normal distribution, Z, for testing proportions and is:

For this case, the sample of 100 found 53 of these loans were smaller than those of other borrowers. The sample proportion, p′ = 53/100= 0.53 The test question, therefore, is : “Is 0.53 significantly different from .50?” Putting these values into the formula for the test statistic we find that 0.53 is only 0.60 standard deviations away from .50. This is barely off of the mean of the standard normal distribution of zero. There is virtually no difference from the sample proportion and the hypothesized proportion in terms of standard deviations.

STEP 4 : Compare the test statistic and the critical value.

The calculated value is well within the critical values of ± 1.96 standard deviations and thus we cannot reject the null hypothesis. To reject the null hypothesis we need significant evident of difference between the hypothesized value and the sample value. In this case the sample value is very nearly the same as the hypothesized value measured in terms of standard deviations.

STEP 5 : Reach a conclusion

The formal conclusion would be “At a 5% level of significance we cannot reject the null hypothesis that 50% of first-time borrowers take out smaller loans than other borrowers.” Notice the length to which the conclusion goes to include all of the conditions that are attached to the conclusion. Statisticians, for all the criticism they receive, are careful to be very specific even when this seems trivial. Statisticians cannot say more than they know, and the data constrain the conclusion to be within the metes and bounds of the data.

Try It 9.11

A teacher believes that 85% of students in the class will want to go on a field trip to the local zoo. The teacher performs a hypothesis test to determine if the percentage is the same or different from 85%. The teacher samples 50 students and 39 reply that they would want to go to the zoo. For the hypothesis test, use a 1% level of significance.

Example 9.12

Suppose a consumer group suspects that the proportion of households that have three or more cell phones is 30%. A cell phone company has reason to believe that the proportion is not 30%. Before they start a big advertising campaign, they conduct a hypothesis test. Their marketing people survey 150 households with the result that 43 of the households have three or more cell phones.

Here is an abbreviate version of the system to solve hypothesis tests applied to a test on a proportions.

Try It 9.12

Marketers believe that 92% of adults in the United States own a cell phone. A cell phone manufacturer believes that number is actually lower. 200 American adults are surveyed, of which, 174 report having cell phones. Use a 5% level of significance. State the null and alternative hypothesis, find the p -value, state your conclusion, and identify the Type I and Type II errors.

Example 9.13

The National Institute of Standards and Technology provides exact data on conductivity properties of materials. Following are conductivity measurements for 11 randomly selected pieces of a particular type of glass.

1.11; 1.07; 1.11; 1.07; 1.12; 1.08; .98; .98; 1.02; .95; .95 Is there convincing evidence that the average conductivity of this type of glass is greater than one? Use a significance level of 0.05.

Let’s follow a four-step process to answer this statistical question.

H 0 : μ ≤ 1
H a : μ > 1
Plan : We are testing a sample mean without a known population standard deviation with less than 30 observations. Therefore, we need to use a Student's-t distribution. Assume the underlying population is normal.
Do the calculations and draw the graph .
State the Conclusions : We cannot accept the null hypothesis. It is reasonable to state that the data supports the claim that the average conductivity level is greater than one.

Try It 9.13

The boiling point of a specific liquid is measured for 15 samples, and the boiling points are obtained as follows:

205; 206; 206; 202; 199; 194; 197; 198; 198; 201; 201; 202; 207; 211; 205

Is there convincing evidence that the average boiling point is greater than 200? Use a significance level of 0.1. Assume the population is normal.

Example 9.14

In a study of 420,019 cell phone users, 172 of the subjects developed brain cancer. Test the claim that cell phone users developed brain cancer at a greater rate than that for non-cell phone users (the rate of brain cancer for non-cell phone users is 0.0340%). Since this is a critical issue, use a 0.005 significance level. Explain why the significance level should be so low in terms of a Type I error.

H 0 : p ≤ 0.00034
H a : p > 0.00034

If we commit a Type I error, we are essentially accepting a false claim. Since the claim describes cancer-causing environments, we want to minimize the chances of incorrectly identifying causes of cancer.

We will be testing a sample proportion with x = 172 and n = 420,019. The sample is sufficiently large because we have np' = 420,019(0.00034) = 142.8, nq' = 420,019(0.99966) = 419,876.2, two independent outcomes, and a fixed probability of success p' = 0.00034. Thus we will be able to generalize our results to the population.

Try It 9.14

In a study of 390,000 moisturizer users, 138 of the subjects developed skin diseases. Test the claim that moisturizer users developed skin diseases at a greater rate than that for non-moisturizer users (the rate of skin diseases for non-moisturizer users is 0.041%). Since this is a critical issue, use a 0.005 significance level. Explain why the significance level should be so low in terms of a Type I error.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction

Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
Publisher/website: OpenStax
Book title: Introductory Business Statistics 2e
Publication date: Dec 13, 2023
Location: Houston, Texas
Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/9-4-full-hypothesis-test-examples

© Dec 6, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Basic Statistics

Probability Theory

Mutually exclusive events and overlapping events, objective and subjective probabilities, conditional (posterior) probability, independent events, law of total probabilities, bayes theorem, discrete probability distribution, binomial distribution, poisson distribution, relationship between variables, covariance and correlation, joint probability distribution, sum of random variables, correlation and causation, simpson’s paradox, continuous probability distributions, uniform distribution, exponential distribution, normal distribution, standard normal distribution, approximating binomial with normal, hypothesis testing, type i and type ii errors, statistical significance and practical significance, hypothesis testing process, one-tailed — known standard deviation, two-tailed — known standard deviation, one-tailed — unknown standard deviation, paired t-test, chi-square (χ2), simple linear and multiple linear regression, least squares error estimation, sample size, choice of variables, assumptions, dummy variables, interaction effects, variable selection methods, issues with variables, multicollinearity, outliers and influential observations, coefficient of determination (r 2 ), adjusted r 2, f-ratio: overall model, t-test: coefficients, goodness-of-fit, factor analysis.

Marketing mix Modelling

Coronavirus — What the metrics do not reveal?

Coronavirus — determining death rate.

Marketing Education
How to Choose the Right Marketing Simulator
Self-Learners: Experiential Learning to Adapt to the New Age of Marketing
Negotiation Skills Training for Retailers, Marketers, Trade Marketers and Category Managers
Simulators becoming essential Training Platforms
What they SHOULD TEACH at Business Schools
Experiential Learning through Marketing Simulators

Login to gain unrestricted access to MarketingMind. (Access fee USD 37)

Gain unlimited access to MarketingMind and the Marketing Analytics Practitioner’s Guide for an access fee of USD 37. Why MarketingMind Now Charges Access Fees? MarketingMind was established in 2013 and provided free access to its content for over a decade. Following Prof. Ashok Charan’s retirement in 2023, the platform has introduced access fees to support its maintenance and continued operation. Please refer to the terms of use for more information.

Gain unrestricted access to MarketingMind. Fee: USD 37

Marketingmind.

A hypothesis test examines two mutually exclusive claims about a parameter to determine which is best supported by the sample data. The parameter is usually the mean or proportion of some population variable of importance to the marketer.

The null hypothesis (H 0 ) is the status quo or the default position that there is no relationship or no difference. The alternative or research hypothesis (H A ) is the opposite of the null. It represents the relationship or difference.

The conclusion of the hypothesis test can be right or wrong. Erroneous conclusions are classified as Type I or Type II.

Type I error or false positive occurs when the null hypothesis is rejected, even though it is actually true. There is no difference between the groups, contrary to the conclusion that a significant difference exists.

Type II error or false negative occurs when the null hypothesis is accepted, though it is actually false. The conclusion that there is no difference is incorrect.

An oft quoted example is of the jury system where the defendant is “innocent until proven guilty” (H 0 = “not guilty”, H A = “guilty”). The jury’s decision whether the defendant is not guilty (accept H 0 ), or guilty (reject H 0 ), may be either right or wrong. Convicting the guilty or acquitting the innocent are correct decisions. However, convicting an innocent person is a Type I error, while acquitting a guilty person is a Type II error.

Though one type of error may sometimes be worse than the other, neither is desirable. Researchers and analysts contain the error rates by collecting more data or greater evidence, and by establishing decision norms or standards.

A trade-off however is required because adjusting the norm to reduce type I error results in the increase in type II error, and vice versa. Expressed in terms of the probability of making an error, the standards are summarized in Exhibit 33.20:

Exhibit 33.20 Probability of type I error (α), probability of type II error (β), and power (1- β), the probability of correctly rejecting the null hypothesis.

α : Probability of making a Type I error, also referred to as the significance level, is usually set at 0.05 or 5%, i.e., type I error occurs 5% of the time.

β : Probability of making a Type II error.

1 – β : Called power , is the probability of correctly rejecting the null hypothesis.

Power is the probability of correctly rejecting the null hypothesis, i.e., correctly concluding there was a difference. This usually relates to the objective of the study.

Power is dependent on three factors:

Type I error (α) or significance level : Power decreases with decrease in significance level. The norm for quantitative studies is α = 5%.
Effect size (Δ): The magnitude of the “signal”, or the amount of difference between the parameters of interest. This is specified in terms of standard deviations, i.e., Δ=1 pertains to a difference of 1 standard deviation.
Sample size : Power increases with sample size. While very small samples make statistical tests overly sensitive, very large samples make them insensitive. With excessively large samples, even very small effects can be statistically significant, which raises the issue of practical significance vs. statistical significance.

Power is usually set at 0.8 or 80%, which makes β (type II error) equal to 0.2.

Since both α and power (or β) are typically set according to norms, the size of a sample is essentially a function of the effect size, or the detectable difference. This is discussed further in Section Sample Size — Comparative Studies , in Chapter Sampling .

From the viewpoint of taking decisions, the distinction between statistical significance and practical or market significance must be clearly understood.

Take for example the results of a product validation test (e.g., BASES) reveal, with statistical significance, that a new formulation is likely to increase a brand’s sales by a million dollars. If the gain in sales is too small to offset the costs of introducing the new variant, then the increase is not significant enough to justify the launch of the variant.

In another example, pertaining to a retail bank, a number of initiatives targeting high-value customers, may have resulted in the reported increase in their customer satisfaction rating from 3.0 to 3.5, on a 5 point-scale. This increase suggests that the initiatives had an impact on customer satisfaction. But, if the p-value for the data is 0.1, in that case the result is not statistically significant at the usual level (α=0.05). There is a 10% chance that the difference is merely resulting from sampling error.

If the sample size is increased so that the results are statistically significant, that would increase the level of confidence that the difference is “real” and would justify the introduction of the new initiatives.

Hypothesis tests are classified as one-tailed or two-tailed tests. The one-tailed test specifies the direction of the difference, i.e., the null hypothesis, H 0 , is expressed in terms of the equation parameter ≥ something , or parameter ≤ something .

For instance, in a before and after advertisement screening test, if the ad is expected to improve consumers’ disposition to try a new brand, then the hypothesis may be phrased as follows:

H 0 , null hypothesis: D after ≤ D before

H A , research hypothesis: D after > D before

Where D is the disposition to try the product, expressed as the proportion of respondents claiming they will purchase the brand.

If the direction of the difference is not known, a two-tailed test is applied. For instance, if for the same test, the marketer is interested in knowing whether there is a difference between men and women, in their disposition to buy the brand, the hypothesis becomes:

H 0 , null hypothesis: D male = D female

H A , research hypothesis: D male ≠ D female

The standard process for hypothesis testing comprises the following steps:

H 0 , H A : State the null and alternative hypothesis.
α : Set the level of significance, i.e., the type I error. For most research studies this is set at 5%.
Test statistic : Compute the test statistic. Depending on the characteristics of the test this is either the z-score (standard score), the t-value , or the f-ratio .
p-value : Obtain the p-value by referencing test statistic in the relevant distribution table. The normal distribution is used for referencing the z-score , t distribution for the t-value and the f distribution for the f-ratio .
Test : Accept the research hypothesis H A (reject H 0 ) if p-value < α.

Each of the test statistics is essentially a signal-to-noise ratio, where the signal is the relationship of interest (for instance, the difference in group means), and noise is a measure of variability of groups.

If a measurement scale outcome variable has little variability it will be easier to detect change than if it has a lot of variability (see Exhibit 33.19 ). So, sample size is a function of variability.

A z-score (z) indicates how many standard deviations the sample mean is from the population mean.

Where x̄ is the sample mean, μ is the population mean, and σ=s/√n is the sample standard deviation (refer CLT), and s is the standard deviation of the population.

Details of the t-test are provided in the section t-test, and the f-ratio is covered in the section ANOVA.

Note: The data analysis add-in in excel provides an easy-to-use facility to conduct hypothesis z, t and f tests. P-value calculators are also available online, for instance, at this Social Science Statistics webpage .

One-Tailed — Known Mean and Standard Deviation

For one-tailed, known mean and standard deviation tests the test statistic to use is the z-score.

H 0 : μ ≥120

H A : μ<120

p-value = 0.023 < α = 0.05

Exhibit 33.21 Probability that the sample average consumption of cigarettes is less than 120 is 0.023 or 2.3%.

Note: A p-value from z-score calculator is provided on this webpage .

Two-tailed — Known Mean and Standard Deviation

Example: The mean weight of fresh recruits into the army was reported to be 65.8 kg last year. This year, a sample of 200 recruits was taken, and the mean weight was found to be 66.2 kg. Assuming the population standard deviation is 3.2 kg, at 0.05 significance level, can we conclude that the mean weight has changed since last year?

Exhibit 33.22 If the actual mean was 65.8 kg, there is a 7.7% probability that the sampled recruits would weigh ≥ 66.2 kg or ≤ 65.4 kg.

H 0 : μ=65.8

H A : μ≠65.8

p-value = 0.077 > α = 0.05

The p-value of 0.077 (7.7%), obtained from normal distribution (Exhibit 33.22) for z = 1.77, is not significant for the given level of 5%.

If the actual mean was 65.8 kg, there is a 7.7% probability that the sampled recruits would weigh ≥ 66.2 kg or ≤ 65.4 kg. Since this probability is higher than the significance level of 5%, the null hypothesis is not rejected. We cannot conclude with 95% certainty that the new recruits differ in weight from those recruited last year.

In cases where the population standard deviation is not known, t-value is used as the test statistic for one-tailed tests. (Refer section t-test for details on the test and the t-value).

H 0 : μ ≥ 15 gm per 100 gm of detergent

H A : μ < 15 gm per 100 gm of detergent

Degrees of freedom = 79.

p-value = 0.014.

Note: A p-value from t-value calculator is provided on this webpage .

Paired group test with unknown standard deviation is essentially the same as a single sample t-test. The paired values are reduced to a single series by computing the difference between the two sets.

Example: Sequential monadic tests are frequently used for product testing. The respondents try one product and rate it, move to another product and rate it, and then compare the two. A paired t-test may be used to determine whether an improved formulation is rated higher on an attribute. H A : μ > 0, i.e., new product expected to be rated higher.

The remaining steps are the same as that for a one-tailed, single sample t-test.

Comparison test for two groups of unknown standard deviation also requires use of the t-test, since the population’s standard deviation is not known.

Example: A study was conducted to examine the consumption of coffee by office workers. The statistics for the men and women sampled in this study are given below.

Men: Sample size n M =440, mean x̄ M =46.5 cups per month, standard deviation s M =36.3.

Women: Sample size n W =360, mean x̄ W =35.1 cups per month, standard deviation s W =20.6.

There is a difference of 11.4 cups per month in coffee consumption between men and women. Is this difference resulting from sampling error, or do men consume significantly more coffee than women?

H 0 : μ M – μ W < 0

H A : μ M – μ W > 0

Standard deviation:

The probability of obtaining a t-value of 5.58 or higher with 717 degrees of freedom, when sampling 440 men and 360 women is very low (0.00001 << α = 0.05). More specifically, if women were consuming as much coffee as men, the chance that the sample differences would average 11.4 or more cups per month is only 0.00001. The null hypothesis is rejected. The data strongly suggests that women consumers consume less coffee than men.

Use the Search Bar to find content on MarketingMind.

11 A/B Testing Examples From Real Businesses

Published: April 21, 2023

Whether you're looking to increase revenue, sign-ups, social shares, or engagement, A/B testing and optimization can help you get there.

A/B testing examples graphic with laptop, magnifying glass, and cursor.

But for many marketers out there, the tough part about A/B testing is often finding the right test to drive the biggest impact — especially when you're just getting started. So, what's the recipe for high-impact success?

Truthfully, there is no one-size-fits-all recipe. What works for one business won't work for another — and finding the right metrics and timing to test can be a tough problem to solve. That’s why you need inspiration from A/B testing examples.

In this post, let's review how a hypothesis will get you started with your testing, and check out excellent examples from real businesses using A/B testing. While the same tests may not get you the same results, they can help you run creative tests of your own. And before you check out these examples. be sure to review key concepts of A/B testing.

A/B Testing Hypothesis Examples

A hypothesis can make or break your experiment, especially when it comes to A/B testing. When creating your hypothesis, you want to make sure that it’s:

Focused on one specific problem you want to solve or understand
Able to be proven or disproven
Focused on making an impact (bringing higher conversion rates, lower bounce rate, etc.)

When creating a hypothesis, following the "If, then" structure can be helpful, where if you changed a specific variable, then a particular result would happen.

Here are some examples of what that would look like in an A/B testing hypothesis:

Shortening contact submission forms to only contain required fields would increase the number of sign-ups.
Changing the call-to-action text from "Download now" to "Download this free guide" would increase the number of downloads.
Reducing the frequency of mobile app notifications from five times per day to two times per day will increase mobile app retention rates.
Using featured images that are more contextually related to our blog posts will contribute to a lower bounce rate.
Greeting customers by name in emails will increase the total number of clicks.

Let’s go over some real-life examples of A/B testing to prepare you for your own.

A/B Testing Examples

Website a/b testing examples, 1. hubspot academy's homepage hero image.

Most websites have a homepage hero image that inspires users to engage and spend more time on the site. This A/B testing example shows how hero image changes can impact user behavior and conversions.

Based on previous data, HubSpot Academy found that out of more than 55,000 page views, only .9% of those users were watching the video on the homepage. Of those viewers, almost 50% watched the full video.

Chat transcripts also highlighted the need for clearer messaging for this useful and free resource.

That's why the HubSpot team decided to test how clear value propositions could improve user engagement and delight.

A/B Test Method

HubSpot used three variants for this test, using HubSpot Academy conversion rate (CVR) as the primary metric. Secondary metrics included CTA clicks and engagement.

Variant A was the control.

A/B testing examples: HubSpot Academy's Homepage Hero

For variant B, the team added more vibrant images and colorful text and shapes. It also included an animated "typing" headline.

Variant C also added color and movement, as well as animated images on the right-hand side of the page.

As a result, HubSpot found that variant B outperformed the control by 6%. In contrast, variant C underperformed the control by 1%. From those numbers, HubSpot was able to project that using variant B would lead to about 375 more sign ups each month.

2. FSAstore.com’s Site Navigation

Every marketer will have to focus on conversion at some point. But building a website that converts is tough.

FSAstore.com is an ecommerce company supplying home goods for Americans with a flexible spending account.

This useful site could help the 35 million+ customers that have an FSA. But the website funnel was overwhelming. It had too many options, especially on category pages. The team felt that customers weren't making purchases because of that issue.

To figure out how to appeal to its customers, this company tested a simplified version of its website. The current site included an information-packed subheader in the site navigation.

To test the hypothesis, this A/B testing example compared the current site to an update without the subheader.

This update showed a clear boost in conversions and FSAstore.com saw a 53.8% increase in revenue per visitor.

3. Expoze’s Web Page Background

The visuals on your web page are important because they help users decide whether they want to spend more time on your site.

In this A/B testing example, Expoze.io decided to test the background on its homepage.

The website home page was difficult for some users to read because of low contrast. The team also needed to figure out how to improve page navigation while still representing the brand.

First, the team did some research and created several different designs. The goals of the redesign were to improve the visuals and increase attention to specific sections of the home page, like the video thumbnail.

They used AI-generated eye tracking as they designed to find the best designs before A/B testing. Then they ran an A/B heatmap test to see whether the new or current design got the most attention from visitors.

A/B testing examples: Expoze.io heatmaps

The new design showed a big increase in attention, with version B bringing over 40% more attention to the desired sections of the home page.

This design change also brought a 25% increase in CTA clicks. The team believes this is due to the added contrast on the page bringing more attention to the CTA button, which was not changed.

4. Thrive Themes’ Sales Page Optimization

Many landing pages showcase testimonials. That's valuable content and it can boost conversion.

That's why Thrive Themes decided to test a new feature on its landing pages — customer testimonials .

In the control, Thrive Themes had been using a banner that highlighted product features, but not how customers felt about the product.

The team decided to test whether adding testimonials to a sales landing page could improve conversion rates.

In this A/B test example, the team ran a 6-week test with the control against an updated landing page with testimonials.

This change netted a 13% increase in sales. The control page had a 2.2% conversion rate, but the new variant showed a 2.75% conversion rate.

Email A/B Testing Examples

5. hubspot's email subscriber experience.

Getting users to engage with email isn't an easy task. That's why HubSpot decided to A/B test how alignment impacts CTA clicks.

HubSpot decided to change text alignment in the weekly emails for subscribers to improve the user experience. Ideally, this improved experience would result in a higher click rate.

For the control, HubSpot sent centered email text to users.

A/B test examples: HubSpot, centered text alignment

For variant B, HubSpot sent emails with left-justified text.

A/B test examples: HubSpot, left-justified text alignment

HubSpot found that emails with left-aligned text got fewer clicks than the control. And of the total left-justified emails sent, less than 25% got more clicks than the control.

6. Neurogan’s Deal Promotion

Making the most of email promotion is important for any company, especially those in competitive industries.

This example uses the power of current customers for increasing email engagement.

Neurogan wasn't always offering the right content to its audience and it was having a hard time competing with a flood of other new brands.

An email agency audited this brand's email marketing, then focused efforts on segmentation. This A/B testing example starts with creating product-specific offers. Then, this team used testing to figure out which deals were best for each audience.

These changes brought higher revenue for promotions and higher click rates. It also led to a new workflow with a 37% average open rate and a click rate of 3.85%.

For more on how to run A/B testing for your campaigns, check out this free A/B testing kit .

Social Media A/B Testing Examples

7. vestiaire’s tiktok awareness campaign.

A/B testing examples like the one below can help you think creatively about what to test and when. This is extra helpful if your business is working with influencers and doesn't want to impact their process while working toward business goals.

Fashion brand Vestaire wanted help growing the brand on TikTok. It was also hoping to increase awareness with Gen Z audiences for its new direct shopping feature.

Vestaire's influencer marketing agency asked eight influencers to create content with specific CTAs to meet the brand's goals. Each influencer had extensive creative freedom and created a range of different social media posts.

Then, the agency used A/B testing to choose the best-performing content and promoted this content with paid advertising .

This testing example generated over 4,000 installs. It also decreased the cost per install by 50% compared to the brand's existing presence on Instagram and YouTube.

8. Underoutfit’s Promotion of User-Generated Content on Facebook

Paid advertising is getting more expensive, and clickthrough rates decreased through the end of 2022 .

To make the most of social ad spend, marketers are using A/B testing to improve ad performance. This approach helps them test creative content before launching paid ad campaigns, like in the examples below.

Underoutfit wanted to increase brand awareness on Facebook.

To meet this goal, it decided to try adding branded user-generated content. This brand worked with an agency and several creators to create branded content to drive conversion.

Then, Underoutfit ran split testing between product ads and the same ads combined with the new branded content ads. Both groups in the split test contained key marketing messages and clear CTA copy.

The brand and agency also worked with Meta Creative Shop to make sure the videos met best practice standards.

The test showed impressive results for the branded content variant, including a 47% higher clickthrough rate and 28% higher return on ad spend.

9. Databricks’ Ad Performance on LinkedIn

Pivoting to a new strategy quickly can be difficult for organizations. This A/B testing example shows how you can use split testing to figure out the best new approach to a problem.

Databricks , a cloud software tool, needed to raise awareness for an event that was shifting from in-person to online .

To connect with a large group of new people in a personalized way, the team decided to create a LinkedIn Message Ads campaign. To make sure the messages were effective, it used A/B testing to tweak the subject line and message copy.

The third variant of the copy featured a hyperlink in the first sentence of the invitation. Compared to the other two variants, this version got nearly twice as many clicks and conversions.

Mobile A/B Testing Example

7. hubspot's mobile calls-to-action.

On this blog, you'll notice anchor text in the introduction, a graphic CTA at the bottom, and a slide-in CTA when you scroll through the post. Once you click on one of these offers, you'll land on a content offer page.

While many users access these offers from a desktop or laptop computer, many others plan to download these offers to mobile devices.

But on mobile, users weren't finding the CTA buttons as quickly as they could on a computer. That's why HubSpot tested mobile design changes to improve the user experience.

Previous A/B tests revealed that HubSpot's mobile audience was 27% less likely to click through to download an offer. Also, less than 75% of mobile users were scrolling down far enough to see the CTA button.

So, HubSpot decided to test different versions of the offer page CTA, using conversion rate (CVR) as the primary metric. For secondary metrics, the team measured CTA clicks for each CTA, as well as engagement.

HubSpot used four variants for this test.

For variant A, the control, the traditional placement of CTAs remained unchanged.

For variant B, the team redesigned the hero image and added a sticky CTA bar.

A/B testing examples: HubSpot mobile, A & B

For variant C, the redesigned hero was the only change.

For variant D, the team redesigned the hero image and repositioned the slider.

A/B testing examples: HubSpot mobile, C & D

All variants outperformed the control for the primary metric, CVR. Variant C saw a 10% increase, variant B saw a 9% increase, and variant D saw an 8% increase.

From those numbers, HubSpot was able to project that using variant C on mobile would lead to about 1,400 more content leads and almost 5,700 more form submissions each month.

11. Hospitality.net’s Mobile Booking

Businesses need to keep up with quick shifts in mobile devices to create a consistently strong customer experience.

A/B testing examples like the one below can help your business streamline this process.

Hospitality.net offered both simplified and dynamic mobile booking experiences. The simplified experience showed a limited number of available dates and the design is for smaller screens. The dynamic experience is for the larger mobile device screens. It shows a wider range of dates and prices.

But the brand wasn’t sure which mobile optimization strategy would be better for conversion.

This brand believed that customers would prefer the dynamic experience and that it would get more conversions. But it chose to test these ideas with a simple A/B test. Over 34 days, it sent half of the mobile visitors to the simplified mobile experience, and half to the dynamic experience, with over 100,000 visitors total.

This A/B testing example showed a 33% improvement in conversion. It also helped confirm the brand's educated guesses about mobile booking preferences.

A/B Testing Takeaways for Marketers

A lot of different factors can go into A/B testing, depending on your business needs. But there are a few key things to keep in mind:

Every A/B test should start with a hypothesis focused on one specific problem that you can test.
Make sure you’re testing a control variable (your original version) and a treatment variable (a new version that you think will perform better).
You can test various things, like landing pages, CTAs, emails, or mobile app designs.
The best way to understand if your results mean something is to figure out the statistical significance of your test.
There are a variety of goals to focus on for A/B testing (increased site traffic, lower bounce rates, etc.), but you should be able to test, support, prove, and disprove your hypothesis.
When testing, make sure you’re splitting your sample groups equally and randomly, so your data is viable and not due to chance.
Take action based on the results you observe.

Start Your Next A/B Test Today

You can see amazing results from the A/B testing examples above. These businesses were able to take action on goals because they started testing. If you want to get great results, you've got to get started, too.

Editor's note: This post was originally published in October 2014 and has been updated for comprehensiveness.

Learn how to run effective A/B experimentation in 2018 here.

Don't forget to share this post!

How to Do A/B Testing: 15 Steps for the Perfect Split Test

Multivariate Testing: How It Differs From A/B Testing

How to A/B Test Your Pricing (And Why It Might Be a Bad Idea)

15 of the Best A/B Testing Tools for 2023

How to Determine Your A/B Testing Sample Size & Time Frame

These 20 A/B Testing Variables Measure Successful Marketing Campaigns

How to Understand & Calculate Statistical Significance [Example]

What is an A/A Test & Do You Really Need to Use It?

The Ultimate Guide to Social Testing

How to Conduct the Perfect Marketing Experiment [+ Examples]

Learn more about A/B and how to run better tests.

Marketing software that helps you drive revenue, save time and resources, and measure and optimize your investments — all on one easy-to-use platform

Statistics Made Easy

4 Examples of Hypothesis Testing in Real Life

In statistics, hypothesis tests are used to test whether or not some hypothesis about a population parameter is true.

To perform a hypothesis test in the real world, researchers will obtain a random sample from the population and perform a hypothesis test on the sample data, using a null and alternative hypothesis:

Null Hypothesis (H 0 ): The sample data occurs purely from chance.
Alternative Hypothesis (H A ): The sample data is influenced by some non-random cause.

If the p-value of the hypothesis test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that we have sufficient evidence to say that the alternative hypothesis is true.

The following examples provide several situations where hypothesis tests are used in the real world.

Example 1: Biology

Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals.

For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than they normally do, which is currently 20 inches. To test this, she applies the fertilizer to each of the plants in her laboratory for one month.

She then performs a hypothesis test using the following hypotheses:

H 0 : μ = 20 inches (the fertilizer will have no effect on the mean plant growth)
H A : μ > 20 inches (the fertilizer will cause mean plant growth to increase)

If the p-value of the test is less than some significance level (e.g. α = .05), then she can reject the null hypothesis and conclude that the fertilizer leads to increased plant growth.

Example 2: Clinical Trials

Hypothesis tests are often used in clinical trials to determine whether some new treatment, drug, procedure, etc. causes improved outcomes in patients.

For example, suppose a doctor believes that a new drug is able to reduce blood pressure in obese patients. To test this, he may measure the blood pressure of 40 patients before and after using the new drug for one month.

He then performs a hypothesis test using the following hypotheses:

H 0 : μ after = μ before (the mean blood pressure is the same before and after using the drug)
H A : μ after < μ before (the mean blood pressure is less after using the drug)

If the p-value of the test is less than some significance level (e.g. α = .05), then he can reject the null hypothesis and conclude that the new drug leads to reduced blood pressure.

Example 3: Advertising Spend

Hypothesis tests are often used in business to determine whether or not some new advertising campaign, marketing technique, etc. causes increased sales.

For example, suppose a company believes that spending more money on digital advertising leads to increased sales. To test this, the company may increase money spent on digital advertising during a two-month period and collect data to see if overall sales have increased.

They may perform a hypothesis test using the following hypotheses:

H 0 : μ after = μ before (the mean sales is the same before and after spending more on advertising)
H A : μ after > μ before (the mean sales increased after spending more on advertising)

If the p-value of the test is less than some significance level (e.g. α = .05), then the company can reject the null hypothesis and conclude that increased digital advertising leads to increased sales.

Example 4: Manufacturing

Hypothesis tests are also used often in manufacturing plants to determine if some new process, technique, method, etc. causes a change in the number of defective products produced.

For example, suppose a certain manufacturing plant wants to test whether or not some new method changes the number of defective widgets produced per month, which is currently 250. To test this, they may measure the mean number of defective widgets produced before and after using the new method for one month.

They can then perform a hypothesis test using the following hypotheses:

H 0 : μ after = μ before (the mean number of defective widgets is the same before and after using the new method)
H A : μ after ≠ μ before (the mean number of defective widgets produced is different before and after using the new method)

If the p-value of the test is less than some significance level (e.g. α = .05), then the plant can reject the null hypothesis and conclude that the new method leads to a change in the number of defective widgets produced per month.

Additional Resources

Introduction to Hypothesis Testing Introduction to the One Sample t-test Introduction to the Two Sample t-test Introduction to the Paired Samples t-test

Featured Posts

5 Regularization Techniques You Should Know

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

IMAGES

Hypothesis Testing: 4 Steps and Example
What is Hypothesis Testing? Types and Methods
How to Optimize the Value of Hypothesis Testing
A/B Testing in Digital Marketing: Example of four-step hypothesis
Best Example of How to Write a Hypothesis 2024
8 Hypothesis Testing Examples in Real Life

VIDEO

Hypothesis Testing
Hypothesis [Research Hypothesis simply explained]
Lecture 19- Hypothesis Testing: T-Test, Z-Test
Hypotheses & Hypothesis tests
Hypothesis Testing: Introduction, All Terms and Concepts with Examples
Hypothesis Testing and The Null Hypothesis, Clearly Explained!!!

COMMENTS

5 Hypothesis testing
This can be formally expressed as follows: ˉx − μ0 = zσˉx. In this equation, z will tell us how many standard deviations the sample mean ˉx¯x is away from the null hypothesis μ0μ0. Solving for z gives us: z = ˉx − μ0 σˉx = ˉx − μ0 σ / √n. This standardized value (or "z-score") is also referred to as a test statistic.
How to write a hypothesis for marketing experimentation
Following the hypothesis structure: "A new CTA on my page will increase [conversion goal]". The first test implied a problem with clarity, provides a potential theme: "Improving the clarity of the page will reduce confusion and improve [conversion goal].". The potential clarity theme leads to a new hypothesis: "Changing the wording of ...
A Beginner's Guide to Hypothesis Testing in Business
3. One-Sided vs. Two-Sided Testing. When it's time to test your hypothesis, it's important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests, or one-tailed and two-tailed tests, respectively. Typically, you'd leverage a one-sided test when you have a strong conviction ...
Hypothesis Testing: Definition, Uses, Limitations + Examples
Step 1: Using the value of the mean population IQ, we establish the null hypothesis as 100. Step 2: State that the alternative hypothesis is greater than 100. Step 3: State the alpha level as 0.05 or 5%. Step 4: Find the rejection region area (given by your alpha level above) from the z-table.
Hypothesis Testing
Step 5: Present your findings. The results of hypothesis testing will be presented in the results and discussion sections of your research paper, dissertation or thesis.. In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value).
Hypothesis Testing in Marketing Research
In essence, hypothesis testing involves making an educated guess about a population parameter and then using data to determine if the hypothesis is supported or rejected. In the context of marketing, hypotheses can be formulated about consumer behavior, product preferences, advertising effectiveness, and many other aspects of the marketing mix.
How to Conduct the Perfect Marketing Experiment [+ Examples]
Make a hypothesis. Collect research. Select your metrics. Execute the experiment. Analyze the results. Performing a marketing experiment involves doing research, structuring the experiment, and analyzing the results. Let's go through the seven steps necessary to conduct a marketing experiment. 1.
How to Create a Hypothesis for a Marketing Campaign
Research: Study past data and market trends. 3. Formulate: Make a concise statement, e.g., "A loyalty program will boost repeat purchases by 20%." ... For example, to test the hypothesis about the ...
A/B Testing in Digital Marketing: Example of four-step hypothesis
Developing a hypothesis is an essential part of marketing experimentation. Qualitative-based research should inform hypotheses that you test with real-world behavior. The hypotheses help you discover how accurate those insights from qualitative research are. If you engage in hypothesis-driven testing, then you ensure your tests are strategic ...
Hypothesis Testing Tool
Market Research Playbook. This tool can be used alone, but it's also part of the comprehensive Market Research Playbook. It provides step-by-step planning guidance while also helping you utilize more than 25 downloadable tools from the popular AMA Marketer's Toolkit library. This tool is powered by Demand Metric.
9 Key Stages in the Marketing Research Process
The marketing research process - an overview. A typical marketing research process is as follows: Identify an issue, discuss alternatives and set out research objectives. Develop a research program. Choose a sample. Gather information. Gather data. Organize and analyze information and data. Present findings.
Expert Advice on Developing a Hypothesis for Marketing ...
The Basics: Marketing Experimentation Hypothesis. A hypothesis is a research-based statement that aims to explain an observed trend and create a solution that will improve the result. This statement is an educated, testable prediction about what will happen. It has to be stated in declarative form and not as a question.
Marketing Experiments: From Hypothesis to Results
With your marketing objectives in mind, the next step is formulating a hypothesis for your experiment. A hypothesis is a testable prediction that outlines the expected outcome of your experiment. It should be based on existing knowledge, data, or observations and provide a clear direction for your experimental design.
How to apply hypothesis test in marketing data
Regardless of the sample size or the sample data type (categorical or numeric). you can always formulate two sample groups to compare. 3) t-test is easy to explain. it is a hypothesis method ...
How to Write a Strong Hypothesis
Developing a hypothesis (with example) Step 1. Ask a question. Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project. Example: Research question.
Case Study: Role of Marketing Mix: 4.3. Hypothesis Testing
4.3. Hypothesis Testing. As explained in the previous chapter, there are 5 hypotheses in this study. Hypothesis testing analysis is carried out with a significance level of 5%, resulting in a critical t-value of ± 1.96. The hypothesis is accepted if the t-value obtained ≥ 1.96, while hypothesis is not supported if the t-value obtained < 1.96.
A/B Testing: Example of a good hypothesis
For example: Problem Statement: "The lead generation form is too long, causing unnecessary friction.". Hypothesis: "By changing the amount of form fields from 20 to 10, we will increase number of leads.". Proposed solution. When you are thinking about the solution you want to implement, you need to think about the psychology of the ...
Hypothesis Testing in Business Analytics
There are four main steps in hypothesis testing in business analytics: Step 1: State the Null and Alternate Hypothesis. After the initial research hypothesis, it is essential to restate it as a null (Ho) hypothesis and an alternate (Ha) hypothesis so that it can be tested mathematically. Step 2: Collate Data.
Research Hypothesis: Definition, Types, Examples and Quick Tips
Simple hypothesis. A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, "Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking. 4.
9.4 Full Hypothesis Test Examples
A teacher believes that 85% of students in the class will want to go on a field trip to the local zoo. The teacher performs a hypothesis test to determine if the percentage is the same or different from 85%. The teacher samples 50 students and 39 reply that they would want to go to the zoo. For the hypothesis test, use a 1% level of significance.
Hypothesis Testing
Test: Accept the research hypothesis H A (reject H 0) if p-value α. Each of the test statistics is essentially a signal-to-noise ratio, where the signal is the relationship of interest (for instance, the difference in group means), and noise is a measure of variability of groups.
11 A/B Testing Examples From Real Businesses
A/B Test Method. In this A/B test example, the team ran a 6-week test with the control against an updated landing page with testimonials. Results. This change netted a 13% increase in sales. The control page had a 2.2% conversion rate, but the new variant showed a 2.75% conversion rate. Email A/B Testing Examples 5. HubSpot's Email Subscriber ...
4 Examples of Hypothesis Testing in Real Life
Example 1: Biology. Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals. For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than ...

Marketing Research Design & Analysis 2019

5.1 Introduction

5.1.1 The null hypothesis

5.1.2 Statistical inference on a sample

5.1.2.1.2 t-statistic

5.1.2.2 P-values

5.1.2.3 Confidence interval

5.1.3 Choosing the right test

5.1.3.1 Parametric vs. non-parametric tests

5.1.3.2 One-tailed vs. two-tailed test

5.1.4 Summary

5.2 One sample t-test

5.3 Comparing two means

5.3.1 Independent-means t-test

5.3.1.1 Theory

5.3.1.2 Application

5.3.2 Dependent-means t-test

5.3.2.1 Theory

5.3.2.2 Application

5.3.3 Further considerations

5.3.3.2 Significance level, sample size, power, and effect size

5.3.3.3 P-values, stopping rules and p-hacking

5.4 Comparing several means

5.4.1 Introduction

5.4.2 Decomposing variance

5.4.2.1 Total sum of squares

5.4.2.2 Model sum of squares

5.4.2.3 Residual sum of squares

5.4.2.4 Effect strength

5.4.2.5 Test of significance

5.4.3 One-way ANOVA

Independence of observations

Distributional assumptions

Homogeneity of variance

5.4.3.2 Post-hoc tests

5.4.3.2.1 Bonferroni

5.4.3.2.2 Tukey’s HSD

5.5 Non-parametric tests

5.5.1 Mann-Whitney U Test (a.k.a. Wilcoxon rank-sum test)

5.5.2 Wilcoxon signed-rank test

5.5.3 Kruskal-Wallis test

5.6 Categorical data

5.6.1 Confidence intervals for proportions

5.6.2 Chi-square test

5.6.3 Sample size

How to write a hypothesis for marketing experimentation

Creating your strongest marketing hypothesis

The basics: The correct marketing hypothesis format

Level up: Moving from a good to great hypothesis

It’s based on a science

Building marketing hypotheses to create insights

Other articles you might like

Join 5,000 other people who get our newsletter updates

Business Insights

A Beginner’s Guide to Hypothesis Testing in Business

What Is Hypothesis Testing?

Hypothesis Testing in Business

Key Considerations for Hypothesis Testing

2. Significance Level and P-Value

3. One-Sided vs. Two-Sided Testing

4. Sampling

Learn How to Perform Hypothesis Testing

About the Author

What is a Hypothesis?

What are the Types of Hypotheses?

2. Complex Hypothesis

3. Null Hypothesis

4. Alternative Hypothesis

5. Logical Hypothesis

6. Empirical Hypothesis

7. Statistical Hypothesis

What is Hypothesis Testing?

How Hypothesis Testing Works

What Are The Stages of Hypothesis Testing?

Applications of Hypothesis Testing in Research

What is an Example of Hypothesis Testing?

Importance/Benefits of Hypothesis Testing

Criticism and Limitations of Hypothesis Testing

Internal Validity in Research: Definition, Threats, Examples

Alternative vs Null Hypothesis: Pros, Cons, Uses & Examples