Alameda County finally reported their raw signature count! According to Friday’s update from the SoS, they had 51,366 raw signatures (a collection rate of 6.4%, the same as the state average), bringing the total raw signature count up to 1,134,746. We’re still waiting for Amador, Inyo, and Trinity to report in, but with only 37,771 registered voters among them, I doubt they’ll contribute more than 2500 signatures to the raw count.
Also in today’s update are San Francisco’s random sample results. They had a validity rate of 73.7%, bringing the overall validity rate back up to 66.8%. That gives a projection (as of today) of 758,010 valid signatures, not enough to qualify for a full count. (Throwing in my estimate of 2500 raw signatures from the remaining three counties only adds another 1670 signatures, still not enough to get a full count.)
In my previous report I discussed the concept of margin of error, so today I calculated it. If a county has a raw count of R, a sample size of S, and a projected validity rate of P (converting the percentage figure to a decimal fraction), then I calculated the margin of error in signatures as R*sqrt(P*(1-P)/S). (Of course, if S is the same as R, as it is for Alpine, Modoc, and Mono counties, the margin of error is zero.) For example, Kings County had 3,187 raw signatures, a sample size of 500, and a projected validity rate of 0.762. That means the margin of error on the projected 2,428 signatures is 61 signatures (about 2.5%).
Doing this calculation for all the counties that have reported so far and combining them (taking the square root of the sum of the squares) gives a margin of error of 795 signatures on the sum of the counties’ projections of 79,552, or about 10%.
Applying that 10% margin of error to my projection of 758,010 means that Mr. Draper could have as few as 682,209 valid signatures or as many as 833,811. (Actually, what it means is that there is a 68% probability that the true figure is between those limits.) But unless the final projected number of valid signatures is above the 767,235 necessary to trigger a full count, we’ll never know how many valid signatures he actually collected.
One could argue that the criteria for doing a full count should take into consideration the estimated margin of error; that is, instead of projecting more than a fixed number (95% of the amount needed to qualify), if the projected range includes the amount needed to qualify then a full count should be done, but that’s not the way the law is written.
In a previous report I discussed how duplicate signatures were handled. Jim Riley has posted a good comment on that. In addition, my colleague David Cary has posted a PDF of his derivation of the estimation formula (much clearer and yet more rigorous than my hand-wavy one), as well as the PDF of the SoS’s one page description of the formula.
–Steve Chessin
President, Californians for Electoral Reform (CfER)
www.cfer.org
The opinions expressed here are my own and not necessarily those of CfER.
You had asked if there was a better way of measuring duplicates. It might be better to ignore them completely.
We can calculate the apparent valid rate, by adding the number of duplicates in a sample to the number of valid signatures, and dividing by the sample size.
For example, for Contra Costa, there 517 valid signatures, plus 3 duplicates, in a sample of 748. 520/748 is 69.5%, a weak rate, but not terribly so.
But based on those 3 detected duplicates, the state’s method projects a horrible rate of 56.5%, with only 4 out of 7 signatures counting, and over 1/8 (13.4%) being duplicate. Going even further, only 43.1% of the signatures were unique and valid.
Alternatively, imagine a circulator approaching and getting 7 voters to sign the petition. It turns out that two of the signatures don’t count, due to the signer not being registered in the county, or a bad address, or bad signature, or some other mistake that results in the voter not being found in the registrar. Another 2 of the signers will be legitimate, but will either be making their 2nd signing, or will in the future, sign again. So only 1/2 of those count. Only 3 of the 7 will be unique and valid signers.
This particular initiative may draw more duplicates. It is a populist idea, and superficially at least, easy to understand. Take a map puzzle and a saw, and in 60 seconds you have 6 California’s. Some voters may think nothing more about signing twice than voting twice for American Idol, or signing some online petition.
But how likely is that there are no duplicates in San Francisco, and 13.4% across the Bay in Contra Costa? Not too likely. But how about comparing the apparent validity rates of 73.7% (SF) and 69.5% (CC). Quite plausible. There may be systemic differences. San Franciscans, likely spend more of their public time in San Francisco, while people from Contra Costa spend time in Alameda and San Francisco, and perhaps Solano. Even with a Contra Costa petition available in these other locations, there may be differences. Relative to population, the number of signatures in the two counties is quite similar (2.37% in CC and 2.52% in SF).
If the Contra Costa sample had one extra valid signature above the average number in a sample of that size, the estimated validity rate only increases by 0.13%. But the fact that there were 3 duplicates found, rather than 2, ballooned the duplicate rate from 8.9% to 13.4%. The MOE is much greater for duplicates than for simple validity.
And it is asymmetric. A bad sample (bad in the sense of non-representative) can only improve the apparent validity rate a small amount, since we can never estimate less than 0% duplicates. But a bad sample on the upside can harm a petitioner much more because it can grossly overestimate duplicates.
Of the 33 counties completed through August 22, the median estimated duplicate rate is 2.7%. The mean is 4.0%, and the weighted (by raw samples) mean is 5.0%. In smaller counties, where a full count was done, there has not been a duplicate rate above 3.5%. Of course, it is possible that there is a systemic difference. In smaller counties there would be fewer circulators, perhaps for fewer days, and perhaps less chance of duplicates than in larger counties.
From a public policy viewpoint, what is the purpose of sampling? The constitution does not mention sampling. If a petition has enough signatures, and the state were to turn it down on the basis of a sample that estimated it was insufficient, it has denied due process to the petitioner and the signers.
If an insufficient petition was accepted, there is not a great deal of harm. The voters will either vote for the idea or vote against it. The petition is not to determine whether the measure is a good idea or not; it is to determine whether to go to the expense of putting the measure on the ballot.
The signature threshold is a weak filter at best. It will not keep bad ideas with lots of money behind them off the ballot; it will not ensure good ideas, but little money will be on the ballot. It may serve to discourage some harebrained ideas, or at least keep in check the number of issues on the ballot.
The purpose of the sample is to save busywork for the county election officials. A full check of the 311,000 signatures in Los Angeles County must cost in six figures.
A 10% excess is already required to avoid a full count (in the case of this petition with 1.14 million raw signatures, 10% of the threshold is 7% of the signatures collected.)
With this large of a margin, is it really necessary to make a poor attempt at estimating the number of duplicates?
In the case of the 6 California’s petition, if we disregard duplicates, then based on 33 counties, the projected number of statewide signatures is 823,166 which is barely above the required 807,615.
If we include the estimate of duplicates, the projected valid signatures is 766,341. This is below the 95% threshold for triggering a full count.
Based on this, my best guess would be that the initiative will fail a full check, because of duplicates. But California could deny a full check based on an overestimate of the duplicate rate.
An alternative would be to require samples that would produce with 99.9% certainty, that they were within 5% of the true number. But to take into account duplicates, this would require much larger samples.
San Francisco found no duplicates in its 612-signature sample. But the duplicate rate could be as high as 10%, but 15.5% of samples would find no duplicates. Increase the sample size to 5%, and there is a 0.53% chance that a 10% duplicate rate would go undetected. But to be sure a 5% duplicate rate did not go undetected, we would need a 9% sample.
California counties have actually detected a total of 135 duplicates. 58, or 41% of these were found in Plumas County. Plumas County did a full count even though not required to do so. But only 3.5% of signatures were duplicates, despite over 10% of registered voters signing the petition. Of course, Plumas County would not find any more in a full count, since they’ve already done the full count.
62 duplicates were found in full-counties, leaving 73 detected duplicates in sampled counties. Based on those 73 duplicates, it is estimated that there will be 21,000 duplicates in those counties which have completed their sample count (accounting for 41% of the statewide raw count).
For example if a 10% sample were used, then if Contra Costa’s duplicate rate was really as bad as estimated, then we would expect to find 33 duplicates in the sample of 2495. We would have a much finer estimate, and a much better idea of how bad it was.
Another approach would give counties a short period to take a 3% sample. Some counties had already completed their 3% sample before Los Angeles had returned their raw count. So perhaps something like: 2 weeks to complete the raw count; 4 weeks to do a 3% samples (the first 2 weeks are included) or 6 weeks to a 10% sample.
I’ve been pedantically following every update, on my own spreadsheet.
As of September 5th, with about 2/3rds of the random sample complete (68.5% of the signatures that will be checked as part of the random sample, have been, representing 64.5% of the raw signatures, in counties accounting for almost precisely 2/3rds of the state’s registered voters), the proposition is teetering on the balance of qualifying for a full count. With a current validity rate of 67.65%, the remaining third must achieve a validity rate of 67% – in other words, it can slightly underperform the first two-thirds, and still meet the full-count threshold.
To reach the ballot without a full signature count, the remaining counties would need a validity rate of 97.2%, which is impossibly high.
It is a moot projection, since any proposition close to qualifying will require a full count, but 77.09% of the remaining signatures must be valid for the sample to project enough signatures to reach the ballot.
Trying to guess Los Angeles: I ran a few linear regressions of the number of projected valid signatures vs. the number of registered voters, and the raw count. The former has a pretty low R^2 (0.83), so I’m discounting it, but it predicts 195k valid (62.5%) in Los Angeles – in other words, the larger the county, proportionally fewer valid signatures were found. On the other hand, Raw Count vs. Projected Valid gave an R^2 of 0.99 – but it just predicts LA will follow the state as a whole with 67.52%, and thus that there is no significant prediction of diminishing validity rates as the total count of signatures in a county increases.
So far, based on the duplicates found in the random sample, 5.7% of the raw signature count are projected to be duplicates. If every projected duplicate were in fact legitimate (i.e if the duplicate projection were so screwy that the 171 duplicates actually counted [so far] were the only real duplicates – an incredibly unlikely scenario), it would raise the valid rate to 71.7% – which would just barely be enough for the measure to qualify for the ballot.
On my spreadsheet, my projected duplicate count is often 1 higher than the state’s (in one county: 2 higher). I don’t know if that’s an error with my rounding and truncating, or with the SoS’s. The law about projecting duplicates often specifies exactly how to round some calculations, but not others. But it is exceedingly unlikely that those ambiguities (which by my estimate could never exceed 50 duplicates) could tip any of the action thresholds.
Tim Draper initially claimed 1.3 million signatures were submitted, though the Secretary of State’s raw count is only 1.137 million, meaning 13% of the signatures disappeared before they were even considered. If those missing 168k signatures met the prevailing validity rate, the measure would have qualified for the ballot without a full count. I wonder if the discrepency is because submitted petitions were rejected wholesale, or because the measure’s proponents err’d – or lied, to maintain media exposure for their cause? If some petitions bearing signatures were rejected before the raw count, I wonder how many, and why?