Talk:Bayes' theorem/Archive 4

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

→

Muggles and witches

As I stated during this discussion of the cookie example, I think that the medical example is more compelling, but do not object to having both. But I do object to eliminating the medical example in favor of the cookie example.

While reading the New York Times today, my attention was drawn to an article on The Genetic Theory of Harry Potter, which discusses questions about (according to the article) a recessive gene w for wizardry that is an allele that is normally represented by the dominant gene M for a muggle. That is, to be a wizard, one has to have two of the w allele, one from the father and one from the mother; if one has one w and one M, one is a muggle, but may possibly have children that are wizards, depending upon the alleles that ones spouse may have.

This made me think of a possibly more compelling way of presenting the ideas that were present in the cookie example. One can ask the question, for example, given that in the general population, a certain percentage of individuals are witches (that is, ww, homozygous for the w allele), assuming random mating (Hardy-Weinberg equilibrium, how does one calculate the percentage of individuals that are heterozygous (that is, Mw or wM, where the first letter indicates the allele inherited from the father and the second the allele inherited from the mother), and the percentage that are homozygous for M? I pass on the question about whether random mating is a reasonable assumption for people who are witches or who know that they are descended from witches; this is a reasonable question, but beyond the scope of my comments.

The paragraph above allows us to compute the prior on the three (four if one distinguishes Mw and wM) cases. Since the frequency of ww is under our control (as pedagogues), we can manufacture any example we wish.

Now one can pose questions like: Suppose a couple has three children, all muggles. What is the probability that neither parent has the w allele? What is the probability that both parents have genotype Mw? What is the probability that one parent has genotype Mw and the other is MM? What is the probability that both have genotype ww (zero, but one can calculate this formally from Bayes' theorem).

Or, same questions, except that the couple has three children, one wizard and two muggles? Or if they have three children, all wizards?

As you can see, the questions one may ask are quite varied and all illustrate both the idea of setting a prior, and the idea of how to turn a prior into a posterior, given data.

Now, I am not wedded to the muggles-wizards thing here, although I think that many young people just getting to college may have grown up with Harry Potter and may find this an interesting and compelling example. It could just as well be an example with (say) the sickle cell trait, or some other actual biological example. But I am thinking of using it in my own teaching. So, I put it out here for your consideration. Bill Jefferys 00:14, 12 December 2005 (UTC)

Sounds interesting. Maybe you could write a rough draft on your own User page, and then when you think you have something that is ready for prime time, you could present it as a proposed example for the Bayes' Theorem page and then we could discuss it here on this page if you wanted some outside feedback. It's entirely up to you of course. -- Metacomet 01:47, 12 December 2005 (UTC)

OK, give me a few days; semester is ending and I have things that have priority. Bill Jefferys 02:49, 12 December 2005 (UTC)

A surprising example

A friend sent me this surprising example. Suppose there is a test for detecting whether an unborn child is a boy or a girl. If the child is a boy, the test is "perfect": P(Test B|B)=1; if the child is a girl, it is not so good: P(Test G|G)=0.7. Bayes' theorem readily gives the result that P(B|Test B)=10/13, whereas P(G|Test G)=1. The surprising thing is that the "perfection" of the test is transferred from boys to girls when the conditioning is reversed. My friend notes that this can be considered a version of the prosecutor's fallacy, AKA the Harvard Medical School fallacy.

I don't know if this could be used as an example in this article or whether it should appear in the prosecutor's fallacy article. In any case, it is rather counterintuitive and deserves mention somewhere.

I'll mention this on the talk page of the prosecutor's fallacy article. Bill Jefferys 21:32, 16 December 2005 (UTC)

I can never figure these examples out without setting up my trusty 3 x 3 table of joint probability. I worked it out for this example using the numbers you provided and the definition of conditional probability:

	Test = B	Test = G	Totals
Actual = B	0.50	0.00	0.50
Actual = G	0.15	0.35	0.50
Totals	0.65	0.35	1.00

It was not obvious to me, maybe it was to some others, but the key to the surprising result is the zero joint probability for the test finding a girl when the actual is a boy, and the non-zero joint probability for the test finding a boy when the actual is a girl (the dual case). Anyway, it is an intriguing example. Bill, thanks for posting it here. -- Metacomet 22:36, 16 December 2005 (UTC)

This actually caused confusion at the RP article. An RP machine has the property that:

If the answer is NO, it always returns NO.
If it ever returns YES, the answer is YES.

We ended up having to explain both carefully to avoid any further confusion. Deco 04:07, 17 December 2005 (UTC)

This is only "surprising" because of the misleading phraseology used when describing the problem. If the problem is stated in a less colloquial manner, the result is not at all surprising:

There is a single test, denoted by the random variable T, which is used to determine the sex of an unborn child. The test can produce two results: either T=boy or T=girl. When the child is a boy, it is known that P(T=boy)=1. When the child is a girl, it is known that P(T=girl)=0.7. This means that the test can only be incorrect when it reports that T=boy. This obviously makes the test less reliable when this result is observed and immediately leads to P(boy|T=boy) < P(girl|T=girl).

No use of Bayes' theorem is necessary. One could speculate that the reason why this was thought to be surprising is the complete absence of logic as a subject in its own right from the modern school curriculum. However, times and fashions change. Lukestuts 14:46, 22 December 2005 (GMT)

Unborn children do not generally have sex. Usually they wait until after they are born, and then for at least 12 years or so. They do, however, have a gender. -- 24.218.218.28 16:00, 22 December 2005 (UTC)

I agree that the confusion comes from imprecise language. Epidemiologists and statisticians who work with screening application like the one above refer to four test characteristics that characterize the performance of a test. These are the four conditional probabilities (with apologies for apparent gender bias):

1. Predictive value positive ==> P(B|Test B)

2. Predictive value negative ==> P(G|Test G)

3. Specificity ==> P(Test B|B)

4. Sensitivity ==> P(Test G|G)

So in the example, the specificity is 1, the sensitivity is 0.7, the PVP is 10/13 and the PVN = 1. I think the confusion comes from thinking about the test as being 'perfect'. - Ken K 21:21, 1 March 2006 (UTC)

B implies testB, ergo ~testB implies ~B, which means testG implies G. Anon, Wed Jul 22 17:58:28 EDT 2009. —Preceding unsigned comment added by 69.138.47.104 (talk) 22:58, 22 July 2009 (UTC)

No, you don't need non-zero probabilities to have well-defined conditionals

[snide remarks deleted]

It is commonplace to say, for example, that the conditional distribution of Y given X is normal with expectation X and variance 1, and X itself is normal with expectation 0 and variance 1. In that case, one is obviously conditioning on an event of probability zero. There's nothing wrong with that. It does mean, however, that the identity Pr(A|B) = Pr(A & B)/Pr(B) would not apply. Michael Hardy 23:41, 4 December 2005 (UTC)

I am bit confused by your explanation. You have said that X is normal with expectation 0 and variance 1, and then you say that we are conditioning on an event of probability zero. Are you talking about event X ? If so, is it true that the probability is zero? The pdf (scratch that, make it the CDF) of X, call it f(X), is greater than zero and monotonically non-decreasing for all non-infinite X. So then, P( a < X < b) = f(b) – f(a) > 0 (in general, although could be = 0 in special cases) for all a and b where b > a. So in what sense is the probability of X equal to zero?

X is not an event; X is a random variable. The event on which one is conditioning here is the event that X has a particular value. That is an event of probability 0, because X is a continuous random variable. The pdf of X is certainly NOT non-decreasing; it's the "bell curve" that increases and then decreases (maybe you meant the cdf rather than the pdf?). OK, I've looked closely at your next sentence. Apparently you did mean the cdf. There's no such thing as the probability of X, since X is not an event. The probability that X has a particular value is an event of probability 0. Michael Hardy 00:04, 11 December 2005 (UTC)

Sorry, you are right, I did mean the CDF and not the pdf. -- Metacomet 00:30, 11 December 2005 (UTC)

So if we define Event A as the event where the continuous random variable X is equal to a specific value, for instance X = a, then of course P(A) = 0 as you have said. But, for a different event, say Event B such that b < X < c, then the probability of this event is not zero, P(B) > 0. But that means that even though X is a continuous random variable, we are now talking about discrete events A and B defined in terms of X. -- Metacomet 00:30, 11 December 2005 (UTC)

Please note, I am asking these question purely in good faith. I have no agenda other than that I am confused and I would like to understand. I am not trying to bust anyone's chops or to forward any particular point of view. I would greatly appreciate your help in understanding this example. Thanks. -- Metacomet 18:16, 10 December 2005 (UTC)

Michael Hardy's point is that in the particular case he considered, the probability of observing a particular value of a continuously distributed quantity (such as a quantity that is normally distributed) is zero. This is not a problem for Bayes' theorem, because in the case of continuously distributed quantities the correct approach is to go over to the probability density for x, which is not zero for any given x. The probability density in his example is given by the standard normal distribution. Bill Jefferys 22:18, 5 December 2005 (UTC)

Isn't this what Bayes.27_theorem_for_probability_densities says? --Henrygb 23:03, 5 December 2005 (UTC)

I will repeat my question again. For two discrete random variables A and B, if the probability of B is zero, then what meaning is there in trying to determine the conditional probability of A given B ? In fact, is it not the case that if P(B) = 0, then in fact P(A|B) is completely indeterminant, and can take on any value whatsoever? If P(B) = 0, then B cannot occur, so how can we talk about the probability of A contigent on B, an event that never happens?

Mathematically, if P(B) = 0, then it follows that P(A&B) = 0. Since P(A|B) is the ratio of P(A&B) divided by P(B), it then follows that P(A|B) is zero divided by zero, which can take on any value, including values less than zero and greater than one. Of course, it would be absurd for a probability to take on these values, but nevertheless, there it is. So my conclusion is, that in order to have a meaningful value for P(A|B), then P(B) cannot equal zero. Could someone please tell me if that is correct or incorrrect, and if not, why not. -- Metacomet 00:38, 6 December 2005 (UTC)

For discrete random variables, your point is valid, but you'll notice I spoke of normally distributed random variables, so they're not discrete. Michael Hardy 01:25, 6 December 2005 (UTC)

Thank you. In most cases where this issue came up, I was in general talking about discrete random variables, although I did not always make that assumption explicit. I understand that Bayes' theorem can be applied to continuous random variables (as the article points out), and I understand the difference between probability and probability density functions. I appreciate your willingness to help me understand the issue involving marginal probabilities that are equal to zero. I need to spend some more time trying to understand the continuous case, and how it differs from the discrete case. Thanks again. Regards, -- Metacomet 01:41, 6 December 2005 (UTC)

Bottom-line

So what's the bottom-line? It seems to me that my original point was in fact correct. If we have two discrete random events, A and B, then in order to have meaningful conditional probabilities, P(A|B) and P(B|A), we must have non-zero marginal probabilities:

P(A)\neq 0

and

P(B)\neq 0

Otherwise, the conditional probabilities become zero divided by zero, which is completely indeterminant, as I discussed above.

Furthermore, even if we are dealing with a continuous random variable X, we still end up defining discrete events A and B in terms of X, in which case once again, in order to have meaningful conditional probabilities, the marginal probabilties cannot equal zero.

-- Metacomet 16:55, 22 December 2005 (UTC)

Good example of a flame-out

[snide remarks deleted]

The reason I am editing this page is because it desperately needed improvement. One of the biggest problems, which was not identified by me but rather by others, was that it was written in a way that was way too technical for a general audience to understand. Oh, and look, that is exactly at the heart of our disagreement over the cookies example. You still have made no credible attempt to identify any valid reasons for deleting the cookies example. What are you afraid of? Why are you totally opposed to helping people understand technical concepts? Or is it more fun to obfuscate ideas with obscure terminology? -- Metacomet 04:45, 5 December 2005 (UTC)

More straw men. Wile E. Heresiarch 05:59, 5 December 2005 (UTC)

Good answer.

OK, I'm going to get to this article soon, when I'm feeling energetic. Michael Hardy 03:21, 5 December 2005 (UTC)

So, is it worth discussing here under what circumstances the identity applies, or is that sufficiently covered under conditional probability? --MarkSweep (call me collect) 04:00, 5 December 2005 (UTC)

I'm inclined to steer around the difficulty in this article and leave interesting details to conditional probability. But it could go the other way too; if we had some alternative texts in front of us, it might be easier to choose. Wile E. Heresiarch 06:04, 5 December 2005 (UTC)

In all seriousness, tell me what I am missing. If P(B) = 0, then from P(A&B) = P(A|B) P(B) two things are clear: (1) P(A&B) = 0, and (2) P(A|B) is an indeterminate quantity (okay, not undefined, but indeterminate). Is that correct?

On the other hand, in the real world, if P(B) = 0, then why would I care what P(A|B) is? If event B never happens, then trying to find P(A|B) is completely meaningless. Who would even want to ask such a question? -- Metacomet 05:29, 5 December 2005 (UTC)

"I don't understand what's going on here, but I'll tell you what to do anyway" is a weak position to argue from, but you don't let that slow you down. I'm accustomed to arguing with people who know what they're talking about; I really don't know how to deal with you. Wile E. Heresiarch 05:59, 5 December 2005 (UTC)

[snide remarks deleted]

It is interesting to note that rather than answering my (legitimate) question, you chose to attack me personally. This is not about me. It is about trying in good faith to improve Wikipedia in general and this article in particular. Or maybe for you it is about something else.... -- Metacomet 13:56, 5 December 2005 (UTC)

No response....

Also, it was Michael Hardy who claimed that it is not necessary for the marginal probabilities to be nonzero. I am not convinced. That doesn't mean I don't know what is going on, that means that he made a statement that I do not understand. I have asked for an explanation, but so far none has been forthcoming. If I am wrong, then I will be the first to admit it (unlike some other people). I am not so fragile that I cannot admit when I do not understand something or when I make a mistake. Grow up! -- Metacomet 14:00, 5 December 2005 (UTC)

No response...

I am accustomed to dealing with people who are interested in learning and growing, not people who need to feed their own ego by showing how much smarter they are than everyone else. -- Metacomet 14:02, 5 December 2005 (UTC)

No response...

Two more points:

At least I have an open mind, and I am willing to consider a point of view different from mine.
Where I come from, asking a question when I don't understand something is not a sign of weakness; it's a sign of strength. The weak person is the one who pretends to understand even when he doesn't.

-- Metacomet 15:22, 5 December 2005 (UTC)

No response...

Simplified explanation on top?

I am proposing putting something like the following into the article:

If I flip a double-headed coin, the probability of getting a head is 1. However, if I flip a coin and get a head, what is the probability that the coin was double-headed? This is an example where Bayes' theorem will apply.

x42bn6 Talk 07:00, 5 December 2005 (UTC)

Possible plagarism

The 'false positivies in a medical test' is lifted directly from "A First Course in Probability" 6th. ed, by Sheldon Ross. ISBN 0-13-033851-6.

Is it lifted word for word, including the same numbers, or is it just the same example? The medical test example is very common. Deco 04:11, 17 December 2005 (UTC)

We could change it to an actual disease with actual figures, which would be better pedagogically anyway. See above for one possibility. Bill Jefferys 20:41, 17 December 2005 (UTC)

Bayes theorem requires a key assumption

I just wanted to point out that the bayes' theorem is only useful IF there is no correlation between the frequency with which the information given is given, and the outcome regardless of your awareness of such a correlation. If there is such a correlation, an adjustment needs to be made.

To demonstrate this, just look at the Monty Hall problem. If you were not told how the host in this problem was choosing doors to open, you would simply use Bayes' theorem along with the given information that you did not choose the goat he shows you. This calculation would lead you to a 1/2 chance to have chosen a goat or the car, given you didn't choose the goat he shows you. But empyrical results would show a 1/3 chance for you to get the car by staying and a 2/3 chance to get the car by switching. This is because there was a correlation between how often you were given the information you were given and the outcome... whether or not you knew it.

Perhaps one can simply adjust by using this formula in place of Bayes' theorem when such a correlation is known:

Any time such a correlation exists, whether you know it or not, Bayes' theorem would lead you to an incorrect answer. - T. Z. K.

I have deleted this sentence, because the theorem as stated is correct and because the sentence is incomprehensible without explanation:

Bayes' theorem depends on the assumption that there is no correlation between the frequency with which information is given and the outcome.

Although I have not digested what is written above, I think that if there's an error, it is an erroneous way of applying the theorem, rather than an error in the way the theorem was stated. It says "you would simply use Bayes' theorem along with the given information that you did not choose the goat he shows you". But I think that if there is additional information to be used, it should have been included within "B" in the expression P(A|B). I don't know what "B is given" means above. The comments above are really not written clearly. Michael Hardy 22:08, 9 March 2006 (UTC)

I agree that if you know the correlation between the frequency with which information is given to you and the outcome, then you could manipulate the events to incorporate this. But this is irrelevant for 2 reasons. 1) You don't always know whether or not such a correlation exists, in which case using bayes' theorem would just give you the wrong answer. If you don't know about this assumption then you wouldn't know why you got the wrong answer. 2) For any given B this statement is always true.

Oh -- one other thing: "compliment" and "complement" are two different words that mean two different things. You wrote the former where you clearly meant the latter. Michael Hardy 22:39, 9 March 2006 (UTC)

Merely a typo. If you have an ulterior motive in pointing this out, let it be known that you are not here to help people arrive at a better understanding of things.

Correlation isn't the right word, in any case. In statistics, 'correlation' has a specific meaning. What is meant is 'independence'. Things can be uncorrelated, but still dependent. And, in the Monty Hall example above, the mistake is failure to write down the correct likelihood function, assuming independence when it doesn't hold. Bill Jefferys 02:41, 10 March 2006 (UTC)

What is meant is correlated. If it is uncorrelated, but dependent, it doesn't matter. But thanks for trying to tell me what I meant rather than asking... Also, the fact that you can consider the given information to be that monty reveals a goat rather than simply that you didn't choose the revealed goat, doesn't change the fact that you are also given the information that you did not choose the shown goat. If you were not told of monty's strategy then this would be all the information you would have, yet using bayes' theorem with it would give you the wrong answer.

Ludicrously overtechnical and hard to read

I understood the introduction. So far, so good. Then I went to "Statement..." expecting a clear statement of what the theorem actually is in plain English. Oh dear. This article is absolutely useless to the layperson. I daresay I could walk away with an understanding of Bayes' theorem if I waded through all the symbols but why would I bother? I'm sure I can find a straightforward explanation of it somewhere else. And I can't sofixit. I have no idea what it should say; I only know it isn't saying it.-- Grace Note

Specifically, at which point does the difficulty begin for you? In "plain English" I would say that Bayes' theorem says: multiply the prior probability distribution by the likelihood function and then normalize, to get the posterior probability distribution. But before that could be understood, I would first have to explain the terms. This article begins with a formula that doesn't require such knowledge, but only a secondary-school-level grasp of what is meant by such things as Pr(A|B), etc. If you want it to be comprehensible to "laypersons" who don't even know that much, you're asking too much, I think. Could you be specific about which is the first thing you couldn't understand? Michael Hardy 23:51, 3 April 2006 (UTC)

PS: Don't take the above to be any sort of complete endorsement of the way the article is now written. Michael Hardy 23:52, 3 April 2006 (UTC)

Hello, we have an actual application of Bayes' theorem and we're discussing it here, hopefully in a way that the average lay person can understand: http://www.thebroth.com/blog/118/bayesian-rating

The text is about how to use Bayes' theorem for online rating facilites - anything where you can rate or vote. Is that something that should go into External Links or does it have value as an application example somewhere in the actual text? Wyxel 10:56, 20 April 2006 (UTC)

I don't see how your rating system actually applies Bayes' theorem. --Henrygb 21:42, 20 April 2006 (UTC)

Innumeracy

Should we mention or link to Innumeracy: Mathematical Illiteracy and its Consequences? The book discusses cognitive difficulties in, and proposes some solutions to, the interpretation of Bayes' rule. Thanks! --63.138.93.195 02:32, 23 April 2006 (UTC)

Definition of Likelihood

I have reverted a recent edit that incorrectly wrote the sampling distribution P(B|A) as proportional to the likelihood L(B|A). This is contrary to customary notation, which writes P(B|A) as proportional to L(A|B). See the article on likelihood.

The reason is that the sampling distribution P(B|A) is an actual, normalized probability, with A fixed (given) and B stochastic. But the likelihood L(A|B) is considered as a function of A, with B fixed (given) and A stochastic (or unknown). The likelihood is not normalized and is not a probability. Bill Jefferys 00:41, 22 May 2006 (UTC)

Medical test

Why was the medical test example removed? --best, kevin [kzollman][talk] 21:37, 28 September 2006 (UTC)

Graphical explanation

There is a nice graphical explanation here (p. 34). Maybe it could be included in the article. --Tgr 08:26, 29 September 2006 (UTC)

Wording of the question

The question "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” is not worded properly. The probability that Fred picked bowl #1 is 0.5, regardless of what cookies we see afterwards. We can't recompute probability of an event that already happened (The probability that a coin landed on "tails" when I flipped it ten years ago is still 0.5 today) and wording questions that attempt this is one of the most common sources of confusion students have in probability theory classes. The proper wording of the question is "What's our belief that Fred picked bowl #1, given that we saw him draw a plain cookie?". That's the posterior, the prior is: "What's our belief that Fred picked bowl #1 before we saw any cookies?". Please revise the example to address this issue.

I think you're misunderstanding that this is about conditional probability. "The probability of A given B" is a conditional probability. Michael Hardy 21:18, 6 October 2006 (UTC)

Just a minor nit: In the text below, the statement "We may assume..." almost sounds like the problem gives us this option. Would the alternate "We are to assume..." be a bit clearer. Also, is there a page that shows the way to solve this problem when the assumption is not made (ie: P(A) is an unknown)? Dan Oetting 20:13, 4 September 2007 (UTC)