While the SoS hasn’t posted a new report since Wednesday (there was a tantalizing broken link yesterday that implied there was an update, but that was a false alarm), I did find out why there was a discrepancy between my numbers and hers. It has to do with how one accounts for duplicates. And it isn’t simple.

The regulations that describe how to verify signatures (a pdf version is available here) specify how this is done, and the SoS’s office sent me a nice one-page summary of the formula. They couldn’t provide me with the mathematical background for the formula, however, so I did a web search on the phrase “sampling petitions for duplicate signatures” (I prefer Yahoo! but you can use whatever search engine you like) and that led me to this paper. It’s pretty heavy sledding unless you have a good background in statistics (which I do not), but the takeaway is that duplicates affect the validity rate in approximate inverse proportion to the **square** of the sample size. That is, if you sample 10% of the signatures, while each invalid signature in the sample represents 10 in the total, each duplicate in the sample represents **90** in the total. (This is using the SoS’s formula, which may or may not be identical to the one in the paper.)

The exact formula goes like this:

Let V = (raw count) * (valid signatures in sample) / (sample size).

This is the uncorrected projected valid signatures. Note that

(valid signatures in sample) / (sample size) is the uncorrected validity rate; this is what I reported in my previous post.

Let A = (raw count) / (sample size). They call this the “value of each (sampled) signature”; it’s the inverse of the sample fraction. You’ll note that V is A * (valid signatures in sample).

Let B = A * (A – 1). This is the “extra value” of each duplicate. (I’m not sure where the “-1” comes from, but I’ll take their word for it.)

Let C = B * (number of duplicate signatures). This is the correction factor due to the duplicate signatures.

Then V – C is the corrected projected valid signatures,

and (V – C) / (raw count) is the corrected validity rate.

In any event, when I use the SoS’s formula, I do indeed get the same results. For the four counties reported so far, we have corrected validity rates of 76.4% (Sierra), 54.8% (Solano), 57.8% (Sonoma), and 75.8% (Sutter). The overall validity rate so far (calculated by adding the corrected projected valid signatures from each of those four counties and dividing that sum by the sum of the raw counts of the four counties) is 58.1%.

We’ll have to wait for more counties to report their results to see if Six Californias is likely to make it to the ballot. I can’t guarantee I’ll report on every update the SoS releases, but I’ll try.

Thanks and a tip of the hat to Katherine Montgomery of the Secretary of State’s office for providing me with both the regulations and the one-page “cheat sheet”, and to former CfER Board member David Cary for alluding to the inverse square dependency and suggesting the keywords to use in a web search.