Believe it or not, this all stems from the formal definition of p-value: the p-value is the probability of getting ** as extreme or even more extreme values **if the Null hypothesis is true….

What do we mean by “as extreme or even more extreme”?

Let’s look at an example. A school principal says his high school’s seniors’ average grade on a standardized test were equal to the state’s average of 81. Thirty-one of the principal’s seniors took the test and their average score was 79 with a standard deviation of 6 points.

Because the claim, the student’s mean grade = 81, is a form of equality, it is the Null hypothesis. To test the claim, we use a two-tail test. We can reject the Null hypothesis if the principal’s seniors’ average exam grade is *sufficiently higher* *or sufficiently lower* than the state average.

We will test the claim using a significance level of 5%, 0.05. Because the principal’s senior class taking the test numbered 31, the sample size is at least 30, so we will use a z-test. Using the formula for z-score (or technology), we find a z-score of -1.86 because the students’ average of 79 is 2 points below 81.

But because we must also consider “as extreme…”, we also have a positive z-score of +1.86 which we would find if the students’ mean score was 83 which is 2 points above the state standard.

For example, using StatCrunch to solve the problem, we get the p-value = 0.0635, which is greater than 0.05 and thus we **do not** reject the Null of equality.

And we conclude the principal’s seniors’ score **was not different** from the state average of 81. Said another way, the principal’s seniors equaled the state average of 81.

if we asked StatCrunch to make a p-value plot, we get this:

Here we can see that there are two red areas under the normal curve which together equal 0.0635. One is the probability of getting a z-stat of +1.8559 or greater and the other our -1.8559 or smaller. We look at both because of the definition of p-value – “at least or even more extreme values” in our sample.

If technology gives us the correct p-value, why does understanding that the two-tail p-value is twice the one-tail p-value matter?

If you are using current technology, you might not have thought about this. The technology, like StatCrunch, just does its thing.

But if you are using older technology, such as z-tables, then you need to know to double what you get from them. And many intro stat courses want you to use the tables for some reason. Almost all z-tables, or more properly “standard normal” tables, give the probability/area under the curve to the **left **of the z score you are interested in. You should always check the tables you are using but most will look something like these images:

To find the p-value, we need the area under the curve to the left of -1.86 and to the right of +1.86. This is because we need to probability of getting a mean score of 79 or less on the “low side” of 81. And the probability of getting a mean score 2 or more points [81-79 = 2] higher than 81 on the “high side.” That would give us “at least as extreme” both ways.

Using the typical standard normal table, we find we need two tables – negative z and positive z.

So, from the negative z-table we get an area under the curve below -1.86 of about 0.0314. And the area under the curve to the left of +1.86 is 0.9686. We need the area under the curve to the right or more extreme of +1.86 which is 1 – 0.9686 or 0.0314 again.

Here is the p-value plot of the StatCrunch **left tail** test, e.g., if the principal claimed his seniors scored ** at least** the state standard of 81. You can see the left-tail p-value is 0.0317, which is half the two-tail value of 0.0635.

This is a bit different from the tables’ value of 0.0314 for the left and right tails. The difference is small and due to rounding in the tables which limit us to z-scores with two decimal places. StatCrunch is using four decimal places, and so is likely to be a bit more accurate than the tables.

In this case, we would reject the Null because the p-value is less than 0.05 and conclude the principal’s students did worse than the state standard of 81.

**But the values either way show the two-tail p-value is twice the one-tail p-value.**

Takeways:

- Remember most modern technology, StatCrunch, Excel, etc., will give you the two-tail p-value directly.
- If you are using older technology or are finding the one-tail p-value, remember to double it for a two-tail test.
- Be careful how you phrase your claims. Here if the principal claims his students’ average was the same as the state standard, he was correct. But if he had claimed the students average was at least meeting the state standard, the test proved him wrong.

Here is a link to my Excel based one-sample z-test for the mean. You can download a copy for free.