Intro:
Atheist Luke Muehlhauser interviewed Christian theist Lydia McGrew on the topic (partially, at least) of the application of Bayes' theorem to historical inquiry. Later, Muehlhauser interviewed atheist Richard Carrier which included the same subject. In that podcast interview Carrier dismissed McGrew's paper on the topic which set her husband, Tim McGrew, off on fellow Christian Victor Reppert's blog to show that Carrier doesn't know what he's talking about.
However, nitpicking what is meant to be an intro to a difficult math subject (as though even textbooks don't have basic errors) simply doesn't prove the point that Carrier's forthcoming book on the topic is doomed to failure, even if there are some legitimate examples. There are only three supposed errors here and only one that shows any promise. If Tim McGrew or someone else comes up with something valid, Carrier will just correct the text. They aren't going to refute Bayes' theorem or its application to history as I'm sure they'd agree.
Note, I sent the original incarnation of this post (that was meant to be a concise summary of the "errors") to Carrier and I've been given permission to reproduce his comments (which will only show up here on the web, to my knowledge).
Problem one:
Carrier says:
There are numerous statistical fallacies and statistical illusions (where the correct result violates common sense intuitions). The more of these you are aware of, the better your reasoning will be. An example of a pertinent statistical fallacy arises from the Talpiot tomb discovery (the so-called “Jesus Tomb”), where it was claimed that a particular conjunction of five names was highly improbable, but the fact that there were ten burials in that same tomb was ignored, a serious error. The probability of getting five specific names in a sample of five is indeed low, but the probability of getting those same five names in a sample of ten is much greater. For example, if 1 in 4 people were named Mark, and you picked three people at random, the odds that they would all be named Mark would be 0.253 = 0.016 = 1.6%, in other words very unlikely, but if you picked ten people at random, the odds that any three of them would be named Mark would be 1 – 0.757 = 1 – 0.133 = 0.867 = 87%, in other words very likely. This is the kind of statistical fallacy you need to be aware of if you decide to employ statistical logic in your historical method.
However, a Christian (I'm assuming these are all Christians, I haven't double checked) named Tim over on Victor Reppert's blog says:
There's a cookie for the first person who can explain why this calculation, winding up with "87%," is completely bogus; bonus cookie for the first person to give the proper calculation. (Hint: remember nCr from basic statistics?)
And the Duke of Earl answers his request:
Okay, in the binomial coefficient equations.
10!/(3!x7!) = 120.
Thinking
Thinking
120(0.25^3)(0.75^7)=0.25
So the probability that 3 people in a group of ten are named Mark where 25% of the population is named Mark is 0.25
I won't call it 25% because probabilities are not presented in percentages.
Tim calls that answer good:
A cookie for Duke! Two cookies, in fact! (Is your browser cookie-enabled, Duke?)
Richard Carrier responds:
The information he is leaving out of his math is that the Talpiot tomb has missing names, i.e. we *don't know* what the other names are (as my example states). Thus Duke is calculating for finding exactly three Marks (no more), not for there being *at least* three (i.e there might be 3, 4, 5, 6, 7, 8, 9, or even 10 Marks). But he is right that the correct math is more complex than I use (I gave only the equation for at least 1 Mark in a group of 7, not at least 3 in a group of 10) and the correct result is thus slightly different than I gave, and I'm glad to be reminded of this so I can revise the tutorial. It now has the correct equation: if you picked ten people at random, the odds that at least three of those ten were named Mark would be the converse of the probability of there being less than three Marks in those ten (i.e. the probability of finding only 0 Marks, plus the probability of finding only 1 Mark, plus the probability of finding only 2 Marks), or 1-[(10!/2!8!)(0.25^2)(0.75^8)+
(10!/1!9!)(0.25^1)(0.75^9)+(10!/0!10!)(0.25^0)(0.75^10)] = 1-[(45)(0.0625)(0.1001)+(10)(0.25)(0.0751)+(1)(1)(0.0563)] = 1-[0.2816+0.1877+0.0563] = 1 - 0.5256 = 0.4744. In other words almost 50/50, which is very likely. If there is almost a 50/50 chance of at least three Marks in a tomb of ten, finding three named Mark there is not improbable at all. (Like mission control in *Apollo 13* if I have erred anywhere in my arithmetic, please check it and let me know and I'll correct it, but otherwise the equation is correct). Note that the probability of three names chosen at random from those ten then being these Marks will still be less than this (and differently so for observing three Marks if five names are selected). But in the Talpiot case, the accident of which names remain unmarked is irrelevant to the hypothesis being tested.
Problem 1, part 2:
Tim continues:
Extra credit -- and Duke, perhaps you should just eat your cookies and let someone else have a crack at it -- to what question would Carrier's calculation yield the right answer?
To which the omnipresent internet commenting deity known as Anon answers:
Working backwards.
Carrier's calculation 1-(.75^7)=.86 takes the form of the basic probability formula, 1-P(A)=P(not-A). So in Carrier's example, P(A)= .75^7
.75^7 is the probability that the first 7 people you meet successively are not named Mark (in much the same way as how P(first seven coin flips being heads)=.5^7) .
Hence, P(not-A)= the probability that it is not the case that the first 7 people you meet successively are not named Mark.
So Dr. Carrier should have asked something along the lines of, "What is the probability that it is not the case that the first 7 people you meet successively are not named Mark?"
Tim says:
Yes indeed! Please note that this has absolutely nothing to do with ten guys or with three guys: it's all about seven, namely (to rephrase Anon's version) that if you meet seven guys, at least one of them will be named Mark.
Problem 2:
Tim moves on to the second problem:
...let's start with a conceptual question. Carrier offers this definition:
~h = all other hypotheses that could explain the same evidence (if h is false)
Question: what role does the phrase "could explain the same evidence" play in the definition of ~h? [Warning: this is a trick question.]
Mr. Veale adds:
~h includes hypotheses that lower the probability of the evidence.
And then Mr. Veale says:
~h is just all the hypotheses that aren't h. That's it. You consider them before you consider the evidence.
That information is concealed in that subtle little word prior
Mattghg suggests:
Um, is the answer: no role at all? The definition of ~h should just be 'h is false', right?
Tim says that Mr. Veale and Mattghg are correct. This one seems to be a matter of nitpicking.
Richard Carrier responds:
I'm still not sure what they are saying is supposed to be an error here. The statement "The definition of ~h should just be 'h is false', right?" is a statement entailed by my statement. So they aren't contradicting anything I said. So what's mistaken? If any hypothesis exclusive of h is true, then h is false (by obvious deductive logic); therefore if ~h includes all hypotheses exclusive of h, then if any one of them is true, h is false (and conversely if h is true, all of them are false).
The reason ~h must include all hypotheses that explain e (but that entail h is false) is mathematical: the sample space must be complete (you can't get a correct ratio if your divisor does not equal the total of all possibles). For example, if h is "Joe got rich by winning the lottery" and e is "Joe got rich" then ~h must include all the other ways Joe can get rich (each one of which can be re-framed as h, and then "Joe got rich by winning the lottery" must become one of the hypotheses included in ~h; as all hypotheses must be commutable this way, all hypotheses must be included in ~h). For example, if data showed that there are only 100 rich people, 10 got rich by winning the lottery, 80 got rich by business, and 1 got rich by space aliens, that leaves 9 unaccounted for. If you calculated the prior odds that Joe got rich by winning the lottery without those unaccounted possibles you'll get the wrong result: 10/91 when there are 100 rich people; if there are 100 rich people then the prior odds Joe got rich by winning the lottery must be 10/100, not 10/91; therefore those other 9 unaccounted for causes of getting rich must be included in ~h, even if you don't know what they are (this gets even more complicated when you address the fact that you can never have a complete sample, e.g. those 100 rich people aren't the only rich people there are, were, or ever will be; this is addressed, of course, with sampling probabilities, etc., but the mathematical fact remains the same that in any sample of 100, the frequency of x must always be x/100, which entails that all ~x must be accounted for, even if by sweeping categories like "unknown causes").
This can be demonstrated formally by expanding the equation to multiple hypotheses (see my formula for that, it's in the same document: PDF p. 4, and p. 15, for expanding ~h into h1, h2, and h3, which can be continued to any h{n}). It can be shown that a sum of probability formulas for three (or any number of) hypotheses alternative to h necessarily equals a single probability formula for ~h alone; therefore a single ~h by definition includes all three hypotheses. This can be iterated to all possible hypotheses. It's just that most of them have a P(h|b) and P(e|h.b) so small we don't even need to count them, e.g. "Joe got rich by my spitting to the left on Wednesday" has a nonzero prior probability (by virtue of our non-omniscience) and a nonzero consequent probability (ditto), but each so small they can have zero observable effect to any decimal place we'd ever bother caring about (so we ignore them). But this still means ~h includes even that hypothesis, as a matter of necessary logic: e.g. we could give it a formula box in the denominator as h4, say, which entails that any single denominator for only ~h alone would have to include the numbers for this h4 (and therefore it always does, it just doesn't matter because those numbers are so small).
But exactly what part of all that that they want to object to is unclear to me.
Problem 3:
Tim says:
On p. 4, Carrier gives the following definition:
P(~h|b) = 1 – P(h|b) = the prior probability that h is false = the sum of the prior probabilities of all alternative explanations of the same evidence (e.g. if there is only one viable alternative, this means the prior probability of all other theories is vanishingly small, i.e. substantially less than 1%, so that P(~h|b) is the prior probability of the one viable competing hypothesis.
[...] what is wrong with the explanation being offered here?
No one has answered this yet, but Tim gives a hint:
[Hint: does viability have anything to do with P(~h|b)? If sub-hypotheses under ~h have non-zero probability given b, even though that probability is low, do they still contribute to P(~h|b)?]
Mike responds:
P(~h|b) = 1 – P(h|b) simply means there is a 100% chance one of the two is correct. Assigning "viability" to one or the other simply exposes your priors.
To which Tim responds:
You're in the zone -- have a peppermint -- but there's something more direct to be said. Every sub-hypothesis under ~h that has a non-zero prior given b contributes to P(~h|b). So to say that if
the prior probability of all other theories is vanishingly small, i.e. substantially less than 1%,
then
P(~h|b) is the prior probability of the one viable competing hypothesis
is just mathematically wrong.
Richard Carrier responds:
The statement is that *if* there is one and only one viable *alternative* hypothesis (PDF p. 4) then "the prior probability of all *other* theories" i.e. all theories that are neither h nor this one viable alternative "is vanishingly small, i.e. substantially less than 1%." Which is actually just a tautology (I'm simply defining "viability," and wholly uncontroversially I might add), so they can have no objection to it. They are mistakenly assuming "all other theories" means "other than h" when I am clearly saying "other than h *and* the one proposed viable alternative." Once that is explained to them they should concede the point. (I italicized the word "other" in both instances in the hopes of making this clearer, although it should have been clear enough already).
Outro:
Muehlhauser wants to save his intellectual reputation too prematurely it seems:
When asked to guess at the competence in probability theory between two people who have been publishing peer-reviewed philosophy literature on probability theory for at least a decade [that would be the McGrews] vs. someone who discovered Bayes’ Theorem in the last few years [that would be Carrier], I’m going to bet on the former in a heartbeat.
Unfortunately that's a false dichotomy from even a non-expert perspective since I pointed out that Carrier says he's had his stuff vetted by qualified people who generally approved of it with minimal changes. The retaliatory Christians out and about on the internet on this issue are conveniently ignoring that (and continue to do so). Further, the disagreement between the McGrews and Carrier turned on miscommunication and not math competency, as Carrier and Lydia McGrew eventually agreed.
Ben
Recent Comments