All posts by steve.chessin

Report #7 on the Six Californias Signature Verification Process

There was no report from the SoS Monday. In Tuesday’s report (available at this website), Inyo County reported a raw count of 616 signatures. (That leaves just Amador and Trinity counties to report their raw numbers. Maybe we should have a pool on who will be last? :-)) Also, during Plumas County’s full count (they didn’t bother with a random sampling), they discovered that their raw count was only 1,618, not the 1,626 they originally reported. That brings the total raw count to 1,135,354 from 1,134,746. Plumas’s validity rate was 76.9%. In addition to Plumas, the following counties have finished their random sampling (with validity rate as indicated): Butte (66.6%), Madera (63.5%), and Mendocino (72.3%). The overall validity rate now stands at 66.9%, up slightly from the 66.8% reported last time.

Twenty of California’s 58 counties have completed their random sampling. At the current validity rate, Six Californias will need 7,797 more raw signatures to qualify for a full count. (I think they’ll be lucky to get another 2,000.) The alternative is for their validity rate to increase to at least 67.6%. The largest county (in terms of raw signatures) to report in so far is San Joaquin, with 27,831 raw signatures and a validity rate of 72.7%. There are nine counties with more raw signatures than San Joaquin: Los Angeles (311,924), San Diego (97,450), San Bernardino (88,067), Riverside (74,478), Orange (52,217), Alameda (51,366), Sacramento (43,578), Fresno (38,382), and Santa Clara (38,366). If their validity rates are higher than the current 66.9% overall number, they could pull it up enough so that Six Signatures will get a full count. Whether a full count would pull it up to the 71.1% needed to qualify for the ballot remains to be seen. (I doubt they can pull it up to the 78.2% necessary to qualify for the ballot without a full count.)

The counties have another month to complete their random sampling. And at the rate the reports are trickling in, it will probably take that long.

–Steve Chessin

President, Californians for Electoral Reform (CfER)

www.cfer.org

The opinions expressed here are my own and not necessarily those of CfER.

Report #6 on the Six Californias Signature Verification Process

Alameda County finally reported their raw signature count! According to Friday’s update from the SoS, they had 51,366 raw signatures (a collection rate of 6.4%, the same as the state average), bringing the total raw signature count up to 1,134,746. We’re still waiting for Amador, Inyo, and Trinity to report in, but with only 37,771 registered voters among them, I doubt they’ll contribute more than 2500 signatures to the raw count.

Also in today’s update are San Francisco’s random sample results. They had a validity rate of 73.7%, bringing the overall validity rate back up to 66.8%. That gives a projection (as of today) of 758,010 valid signatures, not enough to qualify for a full count. (Throwing in my estimate of 2500 raw signatures from the remaining three counties only adds another 1670 signatures, still not enough to get a full count.)

In my previous report I discussed the concept of margin of error, so today I calculated it. If a county has a raw count of R, a sample size of S, and a projected validity rate of P (converting the percentage figure to a decimal fraction), then I calculated the margin of error in signatures as R*sqrt(P*(1-P)/S). (Of course, if S is the same as R, as it is for Alpine, Modoc, and Mono counties, the margin of error is zero.) For example, Kings County had 3,187 raw signatures, a sample size of 500, and a projected validity rate of 0.762. That means the margin of error on the projected 2,428 signatures is 61 signatures (about 2.5%).

Doing this calculation for all the counties that have reported so far and combining them (taking the square root of the sum of the squares) gives a margin of error of 795 signatures on the sum of the counties’ projections of 79,552, or about 10%.

Applying that 10% margin of error to my projection of 758,010 means that Mr. Draper could have as few as 682,209 valid signatures or as many as 833,811. (Actually, what it means is that there is a 68% probability that the true figure is between those limits.) But unless the final projected number of valid signatures is above the 767,235 necessary to trigger a full count, we’ll never know how many valid signatures he actually collected.

One could argue that the criteria for doing a full count should take into consideration the estimated margin of error; that is, instead of projecting more than a fixed number (95% of the amount needed to qualify), if the projected range includes the amount needed to qualify then a full count should be done, but that’s not the way the law is written.

In a previous report I discussed how duplicate signatures were handled. Jim Riley has posted a good comment on that. In addition, my colleague David Cary has posted a PDF of his derivation of the estimation formula (much clearer and yet more rigorous than my hand-wavy one), as well as the PDF of the SoS’s one page description of the formula.

–Steve Chessin

President, Californians for Electoral Reform (CfER)

www.cfer.org

The opinions expressed here are my own and not necessarily those of CfER.

Report #5 on the Six Californias Signature Verification Process

Well, it’s another slow news day in the Six Californias signature verification world. There was no update from the SoS Wednesday. The only news in Thursday’s update was that the County of Santa Barbara finished their random sample, with a validity rate of 54.1%. This brings the overall validity rate down from 66.7% to 65.4%. Still no word from Alameda, Amador, Inyo, or Trinity counties as to their raw counts.

In my previous report I opined how the projected numbers made it seem unlikely that Six Californias would qualify for the ballot. It occurred to me that a random sample is subject to, well, randomness, and even if the projected number is below the number needed to qualify, a full count could reverse that. That indeed is what happened with the “State Fees on Hospitals” initiative that has qualified for the November 2016 ballot, so I thought a review of that initiative’s process might be educational.

Initiative 1613 (as it is known to the SoS) was filed late last April. By May 6th enough counties had submitted their raw counts to the SoS that she was able to declare on May 7th that more than 807,615 signatures had been filed and so the counties should begin their random sampling and report back no later than June 19th.

On June 19th, despite no projected numbers from Inyo, Mariposa, or Trinity counties, she reported that the initiative had a projected validity rate of 64.6% and a projected count of 787,693 signatures, not enough to qualify by random sample (which would have required a projected count of 888,377 signatures, 10% over the 807,615 minimum), but enough to require a full count of each and every signature. The full count was to complete by August 1st.

On August 1st she reported that, even without a full count from Kings County, the initiative had received either 807,950 or 807,984 valid signatures, enough (barely) to qualify for the ballot. (For some reason the spreadsheet shows different numbers in the “Valid Sigs.” and “Valid” column for Humboldt and Imperial counties. Also, the total in the “Valid” column is off by one as well, making me think someone doesn’t understand how to create a spreadsheet that adds the numbers for you.) The actual validity rate was 66.4%, almost two percentage points higher than projected.

I know when one does sampling one should also compute the margin of error. To be rigorous, you have to compute the margin of error separately for each county, and then combine them by squaring each one, adding them together, and then taking the square root. I’m not going to do the complete calculation right now (it’s late and I’m tired; I might do it for Six Californias when they finish the random sampling), but an oversimplified estimate gives an overall margin of error on the order of 5%. Thus the actual validity rate of 66.4% is within the margin of error of the estimated one, which is why even if an initiative is projected to fall short by 5% a full count is done.

Report #4 on the Six Californias Signature Verification Process

It’s a slow news day on the Six Californias signature verification front. (You can find my previous updates here, here, and here.) According to Tuesday’s report from the SoS, a total of eight signatures were collected in Alpine County, of which five were valid (no duplicates), for a validity rate of 62.5%. Also, Yolo County apparently found an additional 27 raw signatures during its sampling process, bringing the total raw count to 1,083,380.

We’re still waiting for the raw counts from Alameda, Amador, Inyo, and Trinity, but it may be they won’t report until they finish their random sample. (This surprises me, because EC 9030(b) says they’re supposed to report their raw totals to the SoS within eight days after receiving the petitions. But I guess there’s no penalty for being late.)

In addition to the aforementioned Alpine County, we now have sampling reports from Kings (76.2% valid), Napa (66.0%), Shasta (69.0%), and Yolo (57.2%) counties. The overall validity rate is 66.7%, up very slightly from yesterday’s 66.4%.

Given the slow news, let’s speculate as to how many signatures Six Californias might pick up from the remaining four counties. According to the Statement of Vote from the June election, there are 17,722,006 registered voters(*) in California. The missing counties account for 841,499 of them. 1,083,380 signatures from a pool of 16,880,507 registered voters is a collection rate of 6.4%. If that same rate holds for the missing counties, we can expect Mr. Draper to pick up another 54,000 signatures or so. With two-thirds of them valid, he’ll have about 758,000 good signatures, not enough to qualify or even force a full count.

(*) I use registered voters instead of eligible voters because my admittedly limited experience with signature gatherers is that they ask people if they’re registered to vote; I’ve never seen one register a non-registered but eligible voter.

But 6.4% is just an average collection rate. In some counties he does better, in some he does worse. For example, the collection rate in Alpine County was only 8/766 or 1.0%. But in Stanislaus County it was 23,302/211,330 or 11.0%. Siskiyou was even better: 2,999/24,833 or 12.1%. The best county, unless I’ve made a mistake, was Del Norte, with a collection rate of 2,377/12,398 or 18.8%.

Of the remaining counties, Alameda is the largest, with 803,728 registered voters. It would be reasonable to expect the collection rate in Alameda County to be similar to that in the surrounding counties of Contra Costa (4.7%), San Francisco (4.7%), San Mateo (1.1%), Santa Clara (4.8%), and San Joaquin (9.5%), but let’s be generous and say it’s 20% there and in the other remaining counties, for an additional 168,300 signatures. If the current validity rate of 66.7% continues to hold, then he’ll qualify with about 834,500 signatures (not enough to avoid a full count, however). But if he only collects signatures from 15% of those voters, he’ll only have about 806,400; enough to force a count of every one, but not enough to qualify.

–Steve Chessin

President, Californians for Electoral Reform (CfER)

www.cfer.org

The opinions expressed here are my own and not necessarily those of CfER.

Report #3 on the Six Californias Signature Verification Process

The SoS has released the latest random sample report for Tim Draper’s initiative to divide the state into six Californias.

Calaveras, Humboldt, Kings, Modoc, Mono, Nevada, and Ventura counties have turned in their raw counts, bringing Tim Draper’s total to 1,083,353 raw signatures (it was 1,038,836 in my first report). That lowers the validity rate he needs to qualify to 74.5% (was 77.7%) and to avoid a full count to 82.0% (was 88.5%). Below 70.8% (was 73.9%) and he doesn’t even get a full count. We’re still waiting for Alameda, Alpine, Amador, Inyo, and Trinity Counties to report their raw numbers. If they bring the raw total up to the 1.3 million claimed, then he needs 62.1% to qualify, 68.3% to avoid a full count.

Also, the following counties have completed their random samples (with validity rates as noted): Merced (66.7%), Modoc (65.4%), Mono (81.0%), Placer (72.5%), and San Joaquin (72.7%). The uncorrected validity rate is 71.8%, up from 70.7% in the first report. When one corrects for duplicates, the validity rate is 66.4%, up from 58.1%.

Speaking of correcting for duplicates, I think I’ve convinced myself that I now understand where the “-1” comes from in the correction factor for duplicate signatures. It’s best explained with an example.

Suppose I have 100 signatures, and I pick 25 of them (one fourth of 100) at random to check. Of the 25 signatures, I find that one person (Mary) isn’t registered to vote, and one person who is registered (John) has signed twice. That means I have 23 valid signatures and 2 invalid ones (Mary’s and one of John’s). The uncorrected validity rate, before the extra accounting for duplicates, is 92% (23/25).

Remember that these signatures were picked at random, so if I found two signatures from John in the 25 I picked, it’s likely that there are three others from John in the other 75. (Well, maybe not likely, but that’s the best estimate.) So John really accounts for 4 duplicate signatures, not just one. But we already accounted for one of those duplicates by calling it invalid in our sample, so we just have to account for the 3 extra duplicates in the unsampled portion.

Also, if John signed more than once in this sample of 25, we can suppose that there are probably three other people in the other 75 who also signed more than once, and the best estimate is that they each also signed five times (one of which is a valid signature in our sample). So a factor of 4 (100/25) for the four people (John plus an estimate of three others) who signed more than once, times 3 (4 – 1) for the fact that one of each duplicate is already accounted for by the uncorrected calculation, means John’s duplicate signature should be given a weight of 12. 12/100 is 12%, so the corrected validity rate is 80%.

Of course, if we found two people in the sample of 25 who signed twice, or if we found three signatures from John in that sample (one that we consider valid and two that we consider invalid), we’d have twice the correction factor (24%), etc.

Now before you start thinking “Gee, if I’m against a petition, I should sign it as many times as I can instead of not signing it at all so as to drive up the duplicate rate, since duplicate signatures hurt more than plain invalid ones”, I have to point out that this is illegal. Election Code section 18612 says “Every person is guilty of a misdemeanor who knowingly signs his or her own name more than once to any initiative, referendum, or recall petition ….” Deliberately signing a false name, while hurting the petition less than signing twice, carries a harsher penalty. Election code section 18613 says “Every person who subscribes to any initiative, referendum, or recall petition a fictitious name […] is guilty of a felony and is punishable by imprisonment pursuant to subdivision (h) of Section 1170 of the Penal Code for two, three, or four years.” So don’t do it.

–Steve Chessin

President, Californians for Electoral Reform (CfER)

www.cfer.org

The opinions expressed here are my own and not necessarily those of CfER.

Update to Six Californias Signature Verification Progress Report

While the SoS hasn’t posted a new report since Wednesday (there was a tantalizing broken link yesterday that implied there was an update, but that was a false alarm), I did find out why there was a discrepancy between my numbers and hers. It has to do with how one accounts for duplicates. And it isn’t simple.

The regulations that describe how to verify signatures (a pdf version is available here) specify how this is done, and the SoS’s office sent me a nice one-page summary of the formula. They couldn’t provide me with the mathematical background for the formula, however, so I did a web search on the phrase “sampling petitions for duplicate signatures” (I prefer Yahoo! but you can use whatever search engine you like) and that led me to this paper. It’s pretty heavy sledding unless you have a good background in statistics (which I do not), but the takeaway is that duplicates affect the validity rate in approximate inverse proportion to the square of the sample size. That is, if you sample 10% of the signatures, while each invalid signature in the sample represents 10 in the total, each duplicate in the sample represents 90 in the total. (This is using the SoS’s formula, which may or may not be identical to the one in the paper.)

The exact formula goes like this:

Let V = (raw count) * (valid signatures in sample) / (sample size).

This is the uncorrected projected valid signatures. Note that

(valid signatures in sample) / (sample size) is the uncorrected validity rate; this is what I reported in my previous post.

Let A = (raw count) / (sample size). They call this the “value of each (sampled) signature”; it’s the inverse of the sample fraction. You’ll note that V is A * (valid signatures in sample).

Let B = A * (A – 1). This is the “extra value” of each duplicate. (I’m not sure where the “-1” comes from, but I’ll take their word for it.)

Let C = B * (number of duplicate signatures). This is the correction factor due to the duplicate signatures.

Then V – C is the corrected projected valid signatures,

and (V – C) / (raw count) is the corrected validity rate.

In any event, when I use the SoS’s formula, I do indeed get the same results. For the four counties reported so far, we have corrected validity rates of 76.4% (Sierra),  54.8% (Solano), 57.8% (Sonoma), and 75.8% (Sutter). The overall validity rate so far (calculated by adding the corrected projected valid signatures from each of those four counties and dividing that sum by the sum of the raw counts of the four counties) is 58.1%.

We’ll have to wait for more counties to report their results to see if Six Californias is likely to make it to the ballot. I can’t guarantee I’ll report on every update the SoS releases, but I’ll try.

Thanks and a tip of the hat to Katherine Montgomery of the Secretary of State’s office for providing me with both the regulations and the one-page “cheat sheet”, and to former CfER Board member David Cary for alluding to the inverse square dependency and suggesting the keywords to use in a web search.

Six Californias Signature Verification Progress Report

The Secretary of State has begun posting the random sample updates for Tim Draper’s initiative to divide the state into six Californias. You can find the most current update at http://www.sos.ca.gov/election… but I’ll summarize today’s for you.

According to the report, Draper turned in 1,038,836 raw signatures. He needs at least 807,615 of them to be valid for his measure to get on the ballot. That’s 77.7% of his raw count. Keep that number in mind; we’ll need it later.

First the SoS does (or rather, the counties do) a random sampling. Each county verifies 3% of the raw signatures at random (or 500, if greater, or all of them, if fewer) and projects from that a validity rate. If they project that he has at least 888,377 valid signatures (110% of the requirement, and 88.5% of the raw count), then the measure qualifies. If they project that he has fewer than 767,235 valid signatures (95% of the requirement; 73.9% of the raw count), then it doesn’t qualify. If they project a number somewhere in between those two limits, they have to check every signature.

As of 1:24pm today, results are in from Sierra, Solano, Sonoma, and Sutter counties. In Sierra County, they checked all 208 signatures and found 159 (76.4%) to be valid. In each of the other counties they had to check 500 signatures. The validity rates were 67.4 (Solano), 64.6% (Sonoma), and 77.8% (Sutter)(*). Overall, out of 1,708 signatures checked, 1,208 were found to be valid, for an overall validity rate of 70.7%.

Now 1,708 is less than two-tenths of a percent of the signatures Draper collected, and it could be that he’ll have a higher validity rate in the rest of the state. But if Sutter turns out to be his best county, Six Californias won’t be on the ballot.

(*) The right-most column of the spreadsheet reports different percentages but they don’t agree with the simple calculations of 337/500, 323/500, and 389/500, respectively. I don’t know how the SoS got those other numbers and perhaps someone with a day job that allows them that kind of research can contact the SoS and find out what they are doing differently.