Is Richard Carrier wrong about Bayes' theorem? |

← Does science show that atheists are angry at God?

(book review) "The Christian Delusion" - Ch. 6: The Bible and Modern Scholarship (part 2) →

January 7, 2011

Is Richard Carrier wrong about Bayes' theorem?

Intro:

Atheist Luke Muehlhauser interviewed Christian theist Lydia McGrew on the topic (partially, at least) of the application of Bayes' theorem to historical inquiry. Later, Muehlhauser interviewed atheist Richard Carrier which included the same subject. In that podcast interview Carrier dismissed McGrew's paper on the topic which set her husband, Tim McGrew, off on fellow Christian Victor Reppert's blog to show that Carrier doesn't know what he's talking about.

However, nitpicking what is meant to be an intro to a difficult math subject (as though even textbooks don't have basic errors) simply doesn't prove the point that Carrier's forthcoming book on the topic is doomed to failure, even if there are some legitimate examples. There are only three supposed errors here and only one that shows any promise. If Tim McGrew or someone else comes up with something valid, Carrier will just correct the text. They aren't going to refute Bayes' theorem or its application to history as I'm sure they'd agree.

Note, I sent the original incarnation of this post (that was meant to be a concise summary of the "errors") to Carrier and I've been given permission to reproduce his comments (which will only show up here on the web, to my knowledge).

Problem one:

Carrier says:

There are numerous statistical fallacies and statistical illusions (where the correct result violates common sense intuitions). The more of these you are aware of, the better your reasoning will be. An example of a pertinent statistical fallacy arises from the Talpiot tomb discovery (the so-called “Jesus Tomb”), where it was claimed that a particular conjunction of five names was highly improbable, but the fact that there were ten burials in that same tomb was ignored, a serious error. The probability of getting five specific names in a sample of five is indeed low, but the probability of getting those same five names in a sample of ten is much greater. For example, if 1 in 4 people were named Mark, and you picked three people at random, the odds that they would all be named Mark would be 0.253 = 0.016 = 1.6%, in other words very unlikely, but if you picked ten people at random, the odds that any three of them would be named Mark would be 1 – 0.757 = 1 – 0.133 = 0.867 = 87%, in other words very likely. This is the kind of statistical fallacy you need to be aware of if you decide to employ statistical logic in your historical method.

However, a Christian (I'm assuming these are all Christians, I haven't double checked) named Tim over on Victor Reppert's blog says:

There's a cookie for the first person who can explain why this calculation, winding up with "87%," is completely bogus; bonus cookie for the first person to give the proper calculation. (Hint: remember nCr from basic statistics?)

And the Duke of Earl answers his request:

Okay, in the binomial coefficient equations.

10!/(3!x7!) = 120.

Thinking
Thinking

120(0.25^3)(0.75^7)=0.25

So the probability that 3 people in a group of ten are named Mark where 25% of the population is named Mark is 0.25

I won't call it 25% because probabilities are not presented in percentages.

Tim calls that answer good:

A cookie for Duke! Two cookies, in fact! (Is your browser cookie-enabled, Duke?)

Richard Carrier responds:

The information he is leaving out of his math is that the Talpiot tomb has missing names, i.e. we *don't know* what the other names are (as my example states). Thus Duke is calculating for finding exactly three Marks (no more), not for there being *at least* three (i.e there might be 3, 4, 5, 6, 7, 8, 9, or even 10 Marks). But he is right that the correct math is more complex than I use (I gave only the equation for at least 1 Mark in a group of 7, not at least 3 in a group of 10) and the correct result is thus slightly different than I gave, and I'm glad to be reminded of this so I can revise the tutorial. It now has the correct equation: if you picked ten people at random, the odds that at least three of those ten were named Mark would be the converse of the probability of there being less than three Marks in those ten (i.e. the probability of finding only 0 Marks, plus the probability of finding only 1 Mark, plus the probability of finding only 2 Marks), or 1-[(10!/2!8!)(0.25^2)(0.75^8)+
(10!/1!9!)(0.25^1)(0.75^9)+(10!/0!10!)(0.25^0)(0.75^10)] = 1-[(45)(0.0625)(0.1001)+(10)(0.25)(0.0751)+(1)(1)(0.0563)] = 1-[0.2816+0.1877+0.0563] = 1 - 0.5256 = 0.4744. In other words almost 50/50, which is very likely. If there is almost a 50/50 chance of at least three Marks in a tomb of ten, finding three named Mark there is not improbable at all. (Like mission control in *Apollo 13* if I have erred anywhere in my arithmetic, please check it and let me know and I'll correct it, but otherwise the equation is correct). Note that the probability of three names chosen at random from those ten then being these Marks will still be less than this (and differently so for observing three Marks if five names are selected). But in the Talpiot case, the accident of which names remain unmarked is irrelevant to the hypothesis being tested.

Problem 1, part 2:

Tim continues:

Extra credit -- and Duke, perhaps you should just eat your cookies and let someone else have a crack at it -- to what question would Carrier's calculation yield the right answer?

To which the omnipresent internet commenting deity known as Anon answers:

Working backwards.

Carrier's calculation 1-(.75^7)=.86 takes the form of the basic probability formula, 1-P(A)=P(not-A). So in Carrier's example, P(A)= .75^7

.75^7 is the probability that the first 7 people you meet successively are not named Mark (in much the same way as how P(first seven coin flips being heads)=.5^7) .

Hence, P(not-A)= the probability that it is not the case that the first 7 people you meet successively are not named Mark.

So Dr. Carrier should have asked something along the lines of, "What is the probability that it is not the case that the first 7 people you meet successively are not named Mark?"

Tim says:

Yes indeed! Please note that this has absolutely nothing to do with ten guys or with three guys: it's all about seven, namely (to rephrase Anon's version) that if you meet seven guys, at least one of them will be named Mark.

Problem 2:

Tim moves on to the second problem:

...let's start with a conceptual question. Carrier offers this definition:

~h = all other hypotheses that could explain the same evidence (if h is false)

Question: what role does the phrase "could explain the same evidence" play in the definition of ~h? [Warning: this is a trick question.]

Mr. Veale adds:

~h includes hypotheses that lower the probability of the evidence.

And then Mr. Veale says:

~h is just all the hypotheses that aren't h. That's it. You consider them before you consider the evidence.

That information is concealed in that subtle little word prior

Mattghg suggests:

Um, is the answer: no role at all? The definition of ~h should just be 'h is false', right?

Tim says that Mr. Veale and Mattghg are correct. This one seems to be a matter of nitpicking.

Richard Carrier responds:

I'm still not sure what they are saying is supposed to be an error here. The statement "The definition of ~h should just be 'h is false', right?" is a statement entailed by my statement. So they aren't contradicting anything I said. So what's mistaken? If any hypothesis exclusive of h is true, then h is false (by obvious deductive logic); therefore if ~h includes all hypotheses exclusive of h, then if any one of them is true, h is false (and conversely if h is true, all of them are false).

The reason ~h must include all hypotheses that explain e (but that entail h is false) is mathematical: the sample space must be complete (you can't get a correct ratio if your divisor does not equal the total of all possibles). For example, if h is "Joe got rich by winning the lottery" and e is "Joe got rich" then ~h must include all the other ways Joe can get rich (each one of which can be re-framed as h, and then "Joe got rich by winning the lottery" must become one of the hypotheses included in ~h; as all hypotheses must be commutable this way, all hypotheses must be included in ~h). For example, if data showed that there are only 100 rich people, 10 got rich by winning the lottery, 80 got rich by business, and 1 got rich by space aliens, that leaves 9 unaccounted for. If you calculated the prior odds that Joe got rich by winning the lottery without those unaccounted possibles you'll get the wrong result: 10/91 when there are 100 rich people; if there are 100 rich people then the prior odds Joe got rich by winning the lottery must be 10/100, not 10/91; therefore those other 9 unaccounted for causes of getting rich must be included in ~h, even if you don't know what they are (this gets even more complicated when you address the fact that you can never have a complete sample, e.g. those 100 rich people aren't the only rich people there are, were, or ever will be; this is addressed, of course, with sampling probabilities, etc., but the mathematical fact remains the same that in any sample of 100, the frequency of x must always be x/100, which entails that all ~x must be accounted for, even if by sweeping categories like "unknown causes").

This can be demonstrated formally by expanding the equation to multiple hypotheses (see my formula for that, it's in the same document: PDF p. 4, and p. 15, for expanding ~h into h1, h2, and h3, which can be continued to any h{n}). It can be shown that a sum of probability formulas for three (or any number of) hypotheses alternative to h necessarily equals a single probability formula for ~h alone; therefore a single ~h by definition includes all three hypotheses. This can be iterated to all possible hypotheses. It's just that most of them have a P(h|b) and P(e|h.b) so small we don't even need to count them, e.g. "Joe got rich by my spitting to the left on Wednesday" has a nonzero prior probability (by virtue of our non-omniscience) and a nonzero consequent probability (ditto), but each so small they can have zero observable effect to any decimal place we'd ever bother caring about (so we ignore them). But this still means ~h includes even that hypothesis, as a matter of necessary logic: e.g. we could give it a formula box in the denominator as h4, say, which entails that any single denominator for only ~h alone would have to include the numbers for this h4 (and therefore it always does, it just doesn't matter because those numbers are so small).

But exactly what part of all that that they want to object to is unclear to me.

Problem 3:

Tim says:

On p. 4, Carrier gives the following definition:

P(~h|b) = 1 – P(h|b) = the prior probability that h is false = the sum of the prior probabilities of all alternative explanations of the same evidence (e.g. if there is only one viable alternative, this means the prior probability of all other theories is vanishingly small, i.e. substantially less than 1%, so that P(~h|b) is the prior probability of the one viable competing hypothesis.

[...] what is wrong with the explanation being offered here?

No one has answered this yet, but Tim gives a hint:

[Hint: does viability have anything to do with P(~h|b)? If sub-hypotheses under ~h have non-zero probability given b, even though that probability is low, do they still contribute to P(~h|b)?]

Mike responds:

P(~h|b) = 1 – P(h|b) simply means there is a 100% chance one of the two is correct. Assigning "viability" to one or the other simply exposes your priors.

To which Tim responds:

You're in the zone -- have a peppermint -- but there's something more direct to be said. Every sub-hypothesis under ~h that has a non-zero prior given b contributes to P(~h|b). So to say that if

the prior probability of all other theories is vanishingly small, i.e. substantially less than 1%,

then

P(~h|b) is the prior probability of the one viable competing hypothesis

is just mathematically wrong.

Richard Carrier responds:

The statement is that *if* there is one and only one viable *alternative* hypothesis (PDF p. 4) then "the prior probability of all *other* theories" i.e. all theories that are neither h nor this one viable alternative "is vanishingly small, i.e. substantially less than 1%." Which is actually just a tautology (I'm simply defining "viability," and wholly uncontroversially I might add), so they can have no objection to it. They are mistakenly assuming "all other theories" means "other than h" when I am clearly saying "other than h *and* the one proposed viable alternative." Once that is explained to them they should concede the point. (I italicized the word "other" in both instances in the hopes of making this clearer, although it should have been clear enough already).

Outro:

Muehlhauser wants to save his intellectual reputation too prematurely it seems:

When asked to guess at the competence in probability theory between two people who have been publishing peer-reviewed philosophy literature on probability theory for at least a decade [that would be the McGrews] vs. someone who discovered Bayes’ Theorem in the last few years [that would be Carrier], I’m going to bet on the former in a heartbeat.

Unfortunately that's a false dichotomy from even a non-expert perspective since I pointed out that Carrier says he's had his stuff vetted by qualified people who generally approved of it with minimal changes. The retaliatory Christians out and about on the internet on this issue are conveniently ignoring that (and continue to do so). Further, the disagreement between the McGrews and Carrier turned on miscommunication and not math competency, as Carrier and Lydia McGrew eventually agreed.

Ben

← Does science show that atheists are angry at God?

(book review) "The Christian Delusion" - Ch. 6: The Bible and Modern Scholarship (part 2) →

Comments (6)

This business is just silly, but exactly what I would expect from the McGrew buffoons. Their paper begins with the presupposition that the New Testament is historically reliable. Beg the question much? Besides this, this childish exercise at Reppert's site is just poisoning the well. It does not matter the paper by Carrier is complete garbage or not. It is entirely irrelevant.
- 1/7/2011 12:02 PM
- Rob
The title of this blog is a bit misleading considering there really wasn't any talk of Bayes' theorem. Instead, there was some nitpicking and some discussion of the proper probability interpretation. For one thing, you can tell Carrier is not a mathematician. Consider his statements assuming the probability in a population of being Mark is 0.25 (i.e., "1 in 4 people are named Mark"). He says if you were to then pick people at random the probability of them being mark would be

Pr(X1 = "Mark") * Pr(X2 = "Mark") * Pr(X3 = "Mark") = 0.25^3 = 1.563%

What is missing is that the selection for X1 does not affect the population. This is where mathematicians say "selection with replacement." Suppose you select a card from a deck. The property of being the card it is will be its proportion in the population. It is 1/52. But if you were to say, "what is the probability that the second card you drew was the card that it was" is not the same as (1/52 * 1/52). It is (1/52 * 1/51) because the second card is selected from the new deck of 51 cards after the first is removed. To get the formula to which Carrier is referring requires replacement. But this is an odd example since it's saying you pick someone from the population, "Hey, are you Mark? Good, now get back into the sample." That may be nitpicking, but it is of paramount importance that you be clear when a sample is drawn with replacement versus without. It changes everything!

However, Carrier is consistent with his example. His probability statement becomes Pr("At least 3 in 10 are named Mark") = 1 - Pr("Nobody is named Mark" OR "One person is named Mark" OR "Two people are named Mark"). Notice that I emphasize the "or" in this probability statement. The reason is that we tend to interpret "and" as multiplication and "or" as addition. In the above example, we were looking at the probability of selecting a card AND the probability of selecting the second card. Consider Pr("A card selected at random from a deck is red OR a king"). Here we have the sum Pr("card is a red") + Pr("card is a king") - Pr("card is both red and a king") = 1/2 + 1/13 - 1/52 = 55.8%. The reason for the final term is because in the first case we're looking at the proportion of red cards in the deck (26/52), the second is the proportions of kings in a deck (4/52), minus that one card which is double-counted already (1/52).

The point is that Carrier's use of the probability axiom Pr(X) + Pr(¬X) = 1 is accurate. Our sample space consists of the possible outcomes, which is "i in 10 is named Mark." From this sample space we can construct events like A = "two people are named Mark" or B = "at least 3 are named Mark." Notice that Pr(A) is ambiguous because we don't know if there is replacement or not. So far Carrier's use has assumed it. So Pr(A) is the product of two independent samples (outcomes) from this space, which is nothing more than the proportion weighted by its likelihood (0.25 being Mark in the population). Thus, Pr(A) = (1/10)(0.25)*(1/10)(0.25) = 0.0625%, a very unlikely event. Now consider the more complicated event of B. What does it mean to be "at least ..." something? If we were looking at some event X, we might say Pr(X > 2) = Pr(X = 3 & X = 4 & ... ). If this were a continuous probability, it would actually be an uncountably infinite statement! It is much easier to consider its compliment: X 2) = 1 - Pr(X = 0 & X = 1 & X = 2). If the support were all numbers, it would do us no benefit since we would have to consider all the negative values. Knowing the support (i.e., domain that has probability "mass" or "density") is crucial.

So Carrier says Pr(B) = 1 - Pr(¬B). Now, what is required for his calculation is Combinations. If we have a five card poker hand and we ask Pr("three cards are red") we need to calculate the probability that 3 of the cards are red and the other 2 are anything else. We use the "choose" operator C, and we say something like "26 choose 3" to say we're choosing 3 cards amongst 26 cards: 26C3 or C(26, 3). This accounts for 3 cards selected amongst the 26 red cards. We still require to know about the 2 other cards. Remember replacement? It cannot happen in this event because we've already selected 3. Thus, we need to consider C(49, 2) to account for the remaining two choices. We then weigh this by the overall choice C(52, 5) that is the hand. Thus,

Pr("Three cards in a poker hand are red") = [C(26, 3) + C(49, 2)] / C(52, 5) = 0.15%

This, however, is utterly wrong. It's only part of the story! I tacitly assumed "three cards red in a poker hand" to mean "exactly 3." We always have to ask, "how many ways can this event manifest?" We have 3 red cards whenever we have exactly 3 cards red, the other 2 black, 4 cards red, the other black, and all cards red. Thus, we have THREE combinatorial statements to deal with: [C(26, 3) + C(49, 2) + C(26, 4) + C(48, 1) + C(26, 5) + C(49, 0)] / C(52, 5). Note the last term in the numerator is 0 but included for clarity. Now, to make sense of this, it is good to know that Carrier expanded what C means.

C(n, k) = n! / (n-k)!k!

Where "!" statements are factorials (e.g., 6! = 6 * 5 * 4 * 3 * 2 * 1), so C(6, 5) = 6! / 1!5! = 6 *( 5 * ... * 1) / (5 * 4 * ... * 1) = 6 since all the other multiples factor away to 1--i.e., 6! = 6 * 5! So C(6, 5) = 6! / 1!5! = 6 * 5! / 5! = 6. Combinations can get ugly, so using software helps!

Now, if we have a messy combination like the above which is basically "at least 3 cards are red", we could have just looked at its complement statement: "no red cards ... one red card ... two red cards." Whether or not this is easier just depends on the situation considered. Combinations directly do not work because we have the probabilities as provided values, not as proportions of a population (e.g., 0.25 marks in a sample vs 1/52 for a card from a random sample). Instead, he makes use of the binomial formula which uses combinations. That formula states the probability of achieving k successes of n independent "yes/no" trials, each of which yields a success with probability p. We can denote 1 - p by q. Thus, we have a function

f(n, p, k) = C(n, k) * p^k * q^(n-k)

I said before that Carrier was consistent with his example because each of these trials is taken to be independent. Thus, we can say the overall experiment is performed with replacement. The function f(n, p, k) is looking at the probability of finding k successes within n trials of "is this person Mark?"

In our case, n = 10 and p = 0.25. So, q = .75. Thus, the only thing that varies is k. Thus, Pr("At least 3 Marks") = f(3) + f(4) + ... f(10). Because in each of these experiments we will have 3 of the trials having Mark ("success"). This is the same as 1 - [f(2) + f(1) + f(0)] which is precisely what Carrier is calculating. Assuming his arithmetic is right, 1 - Pr(¬B) = 47.44%

This can be easily checked using the free software R, my favorite statistics program I use daily! Just enter into the terminal

sum( dbinom(3:10, 10, 0.25) ) = 0.4744072

The "3:10" is just feeding the function a vector of values to check, namely 3, 4, 5, ... 9, 10. We can also confirm the equivalent statement

1 - sum( dbinom(0:2, 10, 0.25) ) = 0.4744072

Or logically check by

sum(dbinom(3:10, 10, 0.25)) == (1 - sum(dbinom(0:2, 10, 0.25)))

The return value is TRUE.

With this lesson over, let's consider problem one. The guys were dead wrong. Duke says,

"So the probability that 3 people in a group of ten are named Mark where 25% of the population is named Mark is 0.25"

and Tim agrees. Wrong. As just demonstrated, the probability that 3 people in a group of 10 are named Mark includes when exactly 3 in 10 are Mark, when 4 in 10 are Mark, ... when 10 in 10 are Mark. They made the boo boo I purposefully demonstrated in the card example. Duke tried to interpret the statement as "exactly 3 are ..." which is not the proper interpretation, especially with regard to how Carrier has framed the problem. Also, using R, the correct answer is

dbinom(3, 10, 0.25) = 0.2502823

So it's a little more than 25% actually.

On Carrier's first version they attacked, however, he was mistaken. His intent of demonstrating P(A) = 1 - P(¬A) is clear, but wrong. His revised version corrected that problem by using the appropriate formula given his framing of the situation.

Tim says, "Hence, P(not-A)= the probability that it is not the case that the first 7 people you meet successively are not named Mark."

This is just wrong. There is no "first 7" here. They are thinking about the negative binomial distribution. This distribution includes another factor, r, which indicates how far out you need to go to get a failure. You use this distribution when answering questions like Pr("five heads in a row"). Here we want to know r = 6. We want to get five "successes" (heads) and then a failure OR 6 successes, ... You can see how messy this can get! Tim has now butchered Carriers example into saying "Hi, what's your name? Mark, okay ... Hi, what's your name? Mark, okay ... Hi what's your name? Donny? Quit!" That's not the situation at all. They're trying to make it out that a proper operation of a complement probability using the binomial turns it into a negative binomial! WRONG.

It all hinges on how we interpret probability statements and the assumptions involved. Carrier does need to be more explicit with his assumptions (e.g., "with replacement" when appropriate). However, Tim has incorrectly interpreted basic probability statements. For instance "At least 3 in 10 are ..." is not the same as "it is not the case that the first 7 people you meet successively are not named Mark." The correct interpretation under the complement description is "0 in 10 are not ... 1 in 10 are not .... 2 in 10 are not named Mark." The reason is that "At least 3 in 10 are ..." is literally "3 in 10 are named Mark" OR "4 in 10 are named Mark" OR ... It has nothing to do with saying "7 are not named Mark." Even if we went that route, they would still have to say "7 are not named Mark OR 6 are not named Mark OR ..." Tim has some idea about probability theory, but he does not know what he's talking about.
- 1/18/2011 9:40 AM
- bryangoodrich
Moving onto the other problems now, Tim seems to just be getting stupid. One thing to realize about hypotheses in probability, especially Bayes' theorem, is that we need them to be exclusive. If they are not exclusive, we cannot compare them! If we had mixed hypotheses we could compare, then we would have some way of breaking them into sub-hypotheses that were exclusive, and we would be comparing them precisely based on those. Another feature we would desire is exhaustive. In the denominator of Bayes' theorem is the total probability. If we have a simple hypothesis, say, "came from bin 1", then our alternative is "did not come from bin 1." So any analysis would have as an alternative the simple model shown on page 27 of Carrier's pdf. If we are not exhaustive, cannot have probabilities sum to 1. This can pose problems with assignments of prior probabilities, but it need not be an entire loss if we're only comparing two specific hypotheses.

It is also important to recognize that all probability theory is within the scope of some domain. That domain is an experiment with certain outcomes. The set of those outcomes is called its sample space. Events are collections of those outcomes as explained before. When we start talking about certain observations (evidence) conferring on some hypothesis, or vice versa, we need to realize that we're still talking about probabilities: events as collections of outcomes from a sample space of an experiment. If this cannot be rigorously detailed, then the probabilistic discussion is nonsensical. That may be extreme, but formally it is correct.

So what does the inclusion of "explain the same evidence" have on the composite term ¬H? Exactly that we have to recognize our hypotheses are relative to a given experiment. Carrier may not have been aware of this, but that is what is entailed by his definitions if they are to be meaningful in this context. The other trite observation extends from above. What does ¬H mean? As indicated, H needs to be an exclusive. Therefore, ¬H will include all other hypotheses within the sample space. Carrier's qualifier regarding evidence plays that role. I don't find it necessary, and it could be stated more precisely, but it plays the proper role in his tutorial.

To get technical, let me provide an example. Rolling a die has six possible outcomes we can number {1, 2, 3, 4, 5, 6}. This is our sample space of that experiment. Events can be singular, such as {1}, {2}, ... {6} or composite {2, 4, 6} and {1, 3, 5}. What events are these? We could call them "rolling an even number" or "rolling an odd number", respectively. We might say {1, 2, 3} and {4, 5, 6} as two events. Which are they? Maybe "rolling a number less than 4" and its complement. It would make no sense to talk about comparing hypotheses of the sort {3, 6} and {2, 4, 6}, which can be "roll a multiple of 3" and "roll an even number", respectively. The problem is that an outcome of rolling a six agrees with both!

That is the point of having exclusive sets. Two sets are exclusive if they share no common elements. In other words, the intersection of the two sets is empty. If it is nonempty, then they are not exclusive. We also require that the hypotheses be relevant to what is going on. The evidence, observation, or outcome must be from the power set of the sample space. The power set of a set is the set containing all the possible subsets of that set. Confusing? Rightfully so. Consider the set {1, 2}. The power set of this set is {0, {1}, {2}, {1, 2}}, where "0" here stands for the empty set (= {}). Thus, the possible outcomes for a binary set like that are those four possibilities: nothing happens, one or the other happens, or both happen. This might model a coin toss, for example. The power set of a set with n elements is 2^n. So 2^6 = 64 possible events from rolling a die. I dare not list them all!

So within the context of this experiment, we can pose various hypotheses (the situation isn't very interesting so I'm not going to bother. I suggest looking up my three blogs on Bayes' theorem for more interesting ones). What is required is that they actually come from the power set of our sample space. There is nothing barring us from saying "what's the probability of rolling a 7?" Of course, it makes no sense! Nevertheless, saying our hypothesis is not one of these events does not necessarily guarantee our outcome is within the sample space. It is required mathematically since we're restricted our domain to the sample space, and complements can only be applied meaningfully in these contexts when a "parent set" is available. One might say our sample space is within the natural numbers. This is true, so talking about the hypothesis of rolling a 7 is not logically incomprehensible, but it is incomprehensible as part of this experiment, as being one of our observations, or as being evidence. Thus, Carrier's qualifier about hypothesis bearing on the evidence contextualizes this fact. But as I said, it would be better stated rigorously within the mathematics.

To drive that point home, consider what it means to have givens in this experiment. Suppose we want to know Pr(X = 2 | "X is an even number"). This is not 1/6 because what the proposition "X is an even number" does is force us to measure the probability from within the subspace {2, 4, 6}. Thus, the Pr(.) = 1/3 in this situation. Now, consider Pr("X is an even number" | X = 2). We should more aptly state this Pr(X = 2 OR X = 4 OR X = 6 | "X is 2"). Well, now our sample space is {2} and we're asking what the probability of getting a 2, 4 or 6 is. Well, the latter two options are 0 (no mass in this support). So really we're looking at Pr(X = 2 | X = 2) + Pr(X = 4 | X = 2) + Pr(X = 6 | X = 2) = 1 + 0 + 0 = 1. One might say that all experiments are stated with an implied "given" that is nothing more than indicating its domain. So even the original die case with sample space S = {1, 2, ..., 6} can be stated like Pr(X | S). Then even these subspace examples provided here would be of the form, Pr(X | Y.S) where Y is the given context and "." is acting as conjunction.

So Carrier has made no error in this "problem." Tim is just nitpicking something that is technically correct. As in the first problem, Tim is trying to interpret probability in a skewed fashion to discredit Carrier, but Tim is absolutely wrong in this case. If he has philosophical issues with what is considered a hypothesis, then that is another matter. Those are always issues: how do we define the experiment? How do we quantify the domain? Of what does the sample space consist? But trying to nitpick the notation of ¬H or something is just asinine. ¬H means "the exclusive and exhaustive alternative to H within the domain of discourse." We might not always be interested in exclusive hypotheses, but we should be. We might not always be interested in exhaustive hypotheses, that is fine but requires consideration when we calculate anything. The domain of discourse is the given Universe which is our sample space.

As for problem 3, I think Carrier disposes of it quickly. He is offering a definition. If they don't like it, fine, but mathematically all he is saying is that our sample space is nearly zero beyond H and it's alternative A. Consider this partition of the space

{H, A, B, C}. The probabilities of these events are {.8, .199999, 0.0000005, 0.0000005}. Here the sum of these exclusive sets (partitions are always exclusive) equals 1, as required. All Carrier is saying is that A is a viable alternative to H under conditions such as this. He is not saying B and C contribute nothing, but numerically they are insignificant. This is not "mathematically wrong." It is just a convenience he is defining. Whether or not non-viable alternatives can be significant should be addressed within the scope of Bayesian probabilities. I am not sure, I'm not up on that stuff. The most Tim can say is that Pr(¬H | b) = Pr(A | b) is not an equality. That is correct. We should use wavy equality for an approximation. But that is effectively saying "does 0.1999999999 = 0.2?" Well, that depends on the extent to which we want precision. If those 9's go out indefinitely, as a real number they are equal. Otherwise, we're cutting it off somewhere. All this "viability" is saying is that we can treat 0.1999999 as 0.2. The extent to which this is justified, by its very definition, depends on the extent to which the sum of all the non-viable hypotheses is trivial. Is everything less than 0.01 trivial? That is something Carrier needs to qualify. I'm sure there could be instances where we need that precision. Most of the time, however, we could let it slip. But "mathematically wrong", as Tim says, is wrong. Carrier defined the damn thing that way. It is true by definition! Again, Tim is butchering interpretations to suit his agenda of discrediting Carrier's statements instead of assessing his statements as they are. Sure, he brought light to the combinatorial issue in problem 1, and it sheds light on precisely what Carrier is saying in this problem (as I articulated it), but neither of those have dire consequences to the program. Substantively, Carrier is correct, even if it isn't always as clear as it could be. But like I said, Carrier isn't a mathematician.
- 1/18/2011 12:30 PM
- bryangoodrich
Note, also, on that last example. When they're talking about Pr(H | b), etc., it is like I said above, they're talking about the universe of discourse. If we understand the context of our knowledge, we can just ignore talking about everything "given b." Technically, priors are the probability of the observation independent of anything else. If we're assuming everything includes "given b" then then we don't need to mention it. So it is edifying to include for philosophical reasons, but it is notationally unimportant and probably left unused in practice lest we make things harder to read and more confused. There's enough shit to keep track of in probability theory and complicated Bayesian expressions! lol So in definitions, it makes sense, but then caveat it out and never use it again! haha
- 1/18/2011 12:33 PM
- bryangoodrich
I would recommend for the interested reader:

Elliot Sober, "Evidence and Evolution."

Not only is it relevant to the creationist topic, the entire first chapter is one of the clearest descriptions of probability theory and Bayes theorem I have ever read.

Also,

Robert Ash, "Basic Probability Theory"

This is a more technical book, but read the Amazon reviews (sold me). The book is very clear and does a really good job explaining the critical concepts of probability theory my class never elaborated on. Very good book to understand the topic intuitively.
- 1/18/2011 6:51 PM
- bryangoodrich
BTW, here are the links to the Bayes' Theorem on my blog references I made above.

Bayes' Theorem
Bayesian Epistemology
Bayes Revisited
- 1/21/2011 11:17 PM
- bryangoodrich

Comments are closed.

January 7, 2011

Is Richard Carrier wrong about Bayes' theorem?

Comments (6)

Post a Comment

WAR_ON_ERROR

Recent Posts

Recent Comments

Categories