Usually we have no control over the sample size of a data set. However, if we are able to set the sample size, as in cases where we are taking a survey, it is very helpful to know just how large it should be to provide the most information. Sampling can be very costly, in both time and product. Simple telephone surveys will cost approximately $30.00 each, for example, and some sampling requires the destruction of the product. Selecting a sample that is too large is expensive and time consuming. But selecting a sample that is too small can lead to inaccurate conclusions. We want to find the minimum sample size required to achieve the desired level of accuracy in the confidence interval.
The margin of error [latex]E[/latex] for a confidence interval for a population mean is
where [latex]z[/latex] is the [latex]z[/latex]-score so that the area under the standard normal distribution in between [latex]-z[/latex] and [latex]z[/latex] is the confidence level [latex]C[/latex].
Rearranging this formula for [latex]n[/latex] we get a formula for the sample size [latex]n[/latex]:
In order to use this formula, we need values for [latex]z[/latex], [latex]E[/latex] and [latex]\sigma[/latex]:
To find the [latex]z[/latex]-score to calculate the sample size for a confidence interval with confidence level [latex]C[/latex], use the norm.s.inv(area to the left of z) function.
The output from the norm.s.inv function is the value of [latex]z[/latex]-score needed to find the sample size.
We want to estimate the mean age of Foothill College students. From previous information, an estimate of the standard deviation of the ages of the students is 15 years. We want to be 95% confident that the sample mean age is within two years of the population mean age. How many randomly selected Foothill College students must be surveyed to achieved the desired level of accuracy?
Solution:
To find the sample size, we need to find the [latex]z[/latex]-score for the 95% confidence interval. This means that we need to find the [latex]z[/latex]-score so that the entire area to the left of [latex]z[/latex] is [latex]\displaystyle=0.975>[/latex].
Function | norm.s.inv | Answer |
Field 1 | 0.975 | 1.9599… |
So [latex]z=1.9599. [/latex]. From the question [latex]\sigma \simeq 15[/latex] and [latex]E=2[/latex].
217 students must be surveyed to achieve the desired accuracy.
Remember to round the value for the sample size UP to the next integer. This ensures that the sample size is an integer and is large enough. Do not forget to include appropriate units with the sample size.
You want to estimate the height of all high school basketball players. You want to be 98% confident with a margin of error of 1.5. From a small pilot study, you estimate the standard deviation to be 3 inches. How large a sample do you need to take to achieve the desired level of accuracy?
Click to see Solution
Function | norm.s.inv | Answer |
Field 1 | 0.99 | 2.3263… |
[latex]\begin n & = & \left(\frac\right)^2 \\ & = & \left(\frac\right)^2 \\ & = & 21.6487. \\& \Rightarrow & 22 \mbox< high school basketball players>\end[/latex]
The margin of error [latex]E[/latex] for a confidence interval for a population proportion is
where [latex]z[/latex] is the [latex]z[/latex]-score so that the area under the standard normal distribution in between [latex]-z[/latex] and [latex]z[/latex] is the confidence level [latex]C[/latex].
Rearranging this formula for [latex]n[/latex] we get a formula for the sample size [latex]n[/latex]:
In order to use this formula, we need values for [latex]z[/latex], [latex]E[/latex] and [latex]p[/latex]:
[/latex] and the population proportion [latex]p[/latex]. In other words, [latex]E[/latex] is set to the maximum allowable width of the confidence interval.
There is an interesting trade-off between the level of confidence and the sample size that shows up here when considering the cost of sampling. The table below shows the appropriate sample size at different levels of confidence and different margins of error, assuming [latex]p=0.5[/latex]. Looking at each row, we can see that for the same margin of error, a higher level of confidence requires a larger sample size. Similarly, looking at each column, we can see that for the same confidence level, a smaller margin of error requires a larger sample size.
Required Sample Size (90%) | Required Sample Size (95%) | Margin of Error |
1691 | 2401 | 2% |
752 | 1067 | 3% |
271 | 384 | 5% |
68 | 96 | 10% |
Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the company survey in order to be 90% confident with a margin of error of 3%?.
Solution:
To find the sample size, we need to find the [latex]z[/latex]-score for the 90% confidence interval. This means that we need to find the [latex]z[/latex]-score so that the entire area to the left of [latex]z[/latex] is [latex]\displaystyle=0.95>[/latex].
Function | norm.s.inv | Answer |
Field 1 | 0.95 | 1.6448… |
So [latex]z=1.6.448. [/latex]. From the question [latex]E=0.03[/latex]. Because no estimate of the population proportion is given, [latex]p=0.5[/latex].
[latex]\begin \\ n & = & p \times (1-p) \times \left(\frac\right)^2 \\ & = & 0.5 \times (1-0.5) \times \left( \frac\right)^2 \\ & = & 751.539. \\ & \Rightarrow & 752 \mbox < customers age 50+>\\ \\ \end[/latex]
752 customers aged 50+ must be surveyed to achieve the desired accuracy.
Remember to round the value for the sample size UP to the next integer. This ensures that the sample size is large enough. Do not forget to include appropriate units with the sample size.
Suppose an internet marketing company wants to determine the percentage of customers who click on ads on their smartphones. How many customers should the company survey in order to be 94% confident that the estimated proportion is within 5% of the population proportion of customers who click on ads on their smartphones?
Click to see Solution
Function | norm.s.inv | Answer |
Field 1 | 0.97 | 1.8807… |
[latex]\begin n & = & p \times (1-p) \times \left(\frac\right)^2 \\ & = & 0.5 \times (1-0.5) \times \left(\frac\right)^2 \\ & = & 353.738. \\& \Rightarrow & 354 \mbox< customers>\end[/latex]
In order to construct a confidence interval, a sample is taken from the population under study. But collecting sample information is time consuming and expensive. The minimum sample size required to achieve the desired level of accuracy is determined before collecting the sample data.
After calculating the value of [latex]n[/latex] from the formula, round the value of [latex]n[/latex] up to the next integer.