A stratified sample is being designed to estimate the prevalence p of a rare characteristic, say the proportion of residents in Milwaukee, Wisconsin, who have Lyme disease. Stratum 1, with N1 units, has a high prevalence of the characteristic; stratum 2, with N2 units, has low prevalence. Assume that the cost to sample a unit (for example, the cost to select a person for the sample and determine whether he or she has Lyme disease) is the same for each stratum, and that at most 2000 units are to be sampled.
a Let p1 and p2 be the proportions in stratum 1 and stratum 2 with the rare characteristic. If p1 = 0.10, p2 = 0.03, and N1/N = 0.4, what are n1 and n2 under optimal allocation?
b If p1 = 0.10, p2 = 0.03, and N1/N = 0.4, what is V(pË†str) under proportional allocation? Under optimal allocation? What is the variance if you take an SRS of 2000 units from the population?
c (Use a spreadsheet for this part of the exercise.) Now fix p = 0.05. Let p1 range from 0.05 to 0.50, and N1/N range from 0.01 to 0.50 (these two values then determine the value of p2). For each combination of p1 and N1/N, find the optimal allocation, and the variance under both proportional allocation and optimal allocation. Also find the variance from an SRS of 2000 units. When does the optimal allocation give a substantial increase in precision when compared to proportional allocation? When compared to an SRS?