Think Bayes - Bayes’s Theorem

Conditional probability

The fundamental idea behind all Bayesian statistics is Bayes’s theorem, which is surprisingly easy to derive, provided that you understand conditional probability. So we’ll start with probability, then conditional probability, then Bayes’s theorem, and on to Bayesian statistics.
我們先從概率開始,然后到條件概率,然后是貝葉斯理論,再到貝葉斯分析。

A probability is a number between 0 and 1 (including both) that represents a degree of belief in a fact or prediction. The value 1 represents certainty that a fact is true, or that a prediction will come true. The value 0 represents certainty that the fact is false.
概率是一個位于包含0和1以及0,1之間的一個數(shù),它代表這一個事實或者一個預(yù)測的可信度。

Intermediate values represent degrees of certainty. The value 0.5, often written as 50%, means that a predicted outcome is as likely to happen as not. For example, the probability that a tossed coin lands face up is very close to 50%.

A conditional probability is a probability based on some background information. For example, I want to know the probability that I will have a heart attack in the next year. According to the CDC, “Every year about 785,000 Americans have a first coronary attack (http://www.cdc.gov/heartdisease/facts.htm).”
一個條件概率是基于一些背景信息的可能性。下面開始使用一個例子了,每年大概有785000美國人犯心臟病。

The U.S. population is about 311 million, so the probability that a randomly chosen American will have a heart attack in the next year is roughly 0.3%.
美國人口大約在311百萬 大概是3億多。所以隨意取一個美國人患心臟病的概率實在0.3% 也就是千分之三。

But I am not a randomly chosen American. Epidemiologists have identified many factors that affect the risk of heart attacks; depending on those factors, my risk might be higher or lower than average.
但是“我”不是一個隨機(jī)選擇的人。疾病學(xué)家認(rèn)為有許多因素可以影響心臟病的風(fēng)險。根據(jù)這些因素,“我”患心臟的可能性可能高于平均水平也可能低于平均水平。

I am male, 45 years old, and I have borderline high cholesterol. Those factors increase my chances. However, I have low blood pressure and I don’t smoke, and those factors decrease my chances.
“我”是一個男性,45歲,我有高固醇。這些因素增加了我患心臟病的概率。但是,我有低血壓,我不抽煙,這些因素又會降低我的患病概率。

Plugging everything into the online calculator at http://hp2010.nhlbihin.net/atpiii/calculator.asp, I find that my risk of a heart attack in the next year is about 0.2%, less than the national average. That value is a conditional probability, because it is based on a number of factors that make up my “condition.”
把這些因素輸入到測試網(wǎng)站上,的出來“我”下一年的患病幾率大概在千分之2,這是低于國家平均水平的。這個值是一個條件概率,因為這是基于一些決定我狀況的因素。

The usual notation for conditional probability is p(A|B), which is the probability of A given that B is true. In this example, A represents the prediction that I will have a heart attack in the next year, and B is the set of conditions I listed.
常見的條件概率的符號表達(dá)式就是 p(A|B)。這是假設(shè)B是真的情況下,A事件發(fā)生的概率。在上面的例子中,A代表著“我”下一年患心臟病的概率,B是我列舉的一系列的條件。

Conjoint probability

Conjoint probability is a fancy way to say the probability that two things are true. I write p(A and B) to mean the probability that A and B are both true.
聯(lián)合概率是一個很好的方式去表達(dá)兩個事情為真的概率。用
p(A and B)來表達(dá)A事件和B事件都為真。

If you learned about probability in the context of coin tosses and dice, you might have learned the following formula:

p (A and B) =p (A) p (B)     WARNING: not always true

如果你學(xué)過硬幣正反面的概率的話,你就學(xué)學(xué)習(xí)到下面的式子。

For example, if I toss two coins, and A means the first coin lands face up, and B means the second coin lands face up, then p(A) =p(B) =0.5, and sure enough, p (A and B) =p(A) p(B) =0.25.
舉個栗子來說,如果我拋兩枚硬幣,A事件代表著第一個硬幣著地的時候是正面朝上,B事件代表著第二個硬幣著地正面朝下,然后A事件和B事件的概率都是0.5,所以A事件和B事件都是真的概率為0.25.

But this formula only works because in this case A and B are independent; that is, knowing the outcome of the first event does not change the probability of the second. Or, more formally, p (B|A) = p(B).
但是上面的式子只是在事件A和B都獨立的時候才有效。也就是說A事件的結(jié)果不會改變B事件的概率,或者更加正式得,p(B|A) = p(B)。從這個式子的字面意思我們也可以看出來就是A事件發(fā)生的概率下B事件發(fā)生的概率還是B事件獨立發(fā)生的概率,也就是A事件啥情況和B事件沒什么關(guān)系。

Here is a different example where the events are not independent. Suppose that A means that it rains today and B means that it rains tomorrow. If I know that it rained today, it is more likely that it will rain tomorrow, so p (B|A) > p (B).
這里有一個不同的事件不是獨立的例子。假設(shè)A事件代表今天下雨了,B事件代表著明天下雨。如果我知道今天下雨了,那么明天下雨的概率就要高一些所以也就有p (B|A) > p (B).

In general, the probability of a conjunction is

p (A and B) = p (A) p (B|A)

for any A and B. So if the chance of rain on any given day is 0.5, the chance of rain on two consecutive days is not 0.25, but probably a bit higher.
上面的式子適用于任意的A事件和B事件。所以如果給定的任何一天下雨的概率是0.5,那么連續(xù)兩天下雨的概率不是0.25,而是更高一點。

The cookie problem

從字面我也不知道這是個什么問題。
We’ll get to Bayes’s theorem soon, but I want to motivate it with an example called the cookie problem. Suppose there are two bowls of cookies. Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. Bowl 2 contains 20 of each.
一個餅干問題。假定這里有兩碗餅干,碗1里面有有30個香草的和10個巧克力的,而碗2里面各有20個香草的和巧克力的。

Now suppose you choose one of the bowls at random and, without looking, select a cookie at random. The cookie is vanilla. What is the probability that it came from Bowl 1?
那么現(xiàn)在假設(shè)你隨機(jī)選一個碗,從這個碗里選一個餅干。這個餅干是香草的,那么這個餅干來自于碗1的概率是多少?

This is a conditional probability; we want p (Bowl 1|vanilla) , but it is not obvious how to compute it. If I asked a different question—the probability of a vanilla cookie given Bowl 1—it would be easy:

p (vanilla|Bowl 1) =3 / 4

這是一個條件概率的問題。我們想要的是p (Bowl 1|vanilla),但是不容易計算,如果我問一個不同的問題,假設(shè)來自碗1,一個香草餅干的概率,那么很簡單我們會得出3/4。

Sadly, p (A|B) is not the same as p (B|A) , but there is a way to get from one to the other: Bayes’s theorem.
但是p (A|B)不同于p (B|A),但是我們可以用貝葉斯思維去解決這個問題。

Bayes’s theorem

At this point we have everything we need to derive Bayes’s theorem. We’ll start with the observation that conjunction is commutative; that is

p (A and B) =p (B and A)

for any events A and B.
我們從這個交換律開始。對于任意的事件A和事件B來說,A和B發(fā)生的概率與B和A發(fā)生的概率是一致的。

Next, we write the probability of a conjunction:

p (A and B) =p (A) p (B|A)

Since we have not said anything about what A and B mean, they are interchangeable. Interchanging them yields

p (B and A) =p (B) p (A|B)

因為我們沒有說A和B是什么,他們是可交換的,交換他們所以產(chǎn)生了上面的式子。

That’s all we need. Pulling those pieces together, we get

p (B) p (A|B) =p (A) p (B|A)

我們把上面的兩個式子結(jié)合在一起,然后我們就得到了這樣的式子。

Which means there are two ways to compute the conjunction. If you have p (A) , you multiply by the conditional probability p (B|A) . Or you can do it the other way around; if you know p (B) , you multiply by p (A|B) . Either way you should get the same thing.
上面的兩種的方法我們可以得到相同的結(jié)果。

Finally we can divide through by p (B) :

p (A|B) = p (A) p (B|A) / p (B)

And that’s Bayes’s theorem! It might not look like much, but it turns out to be surprisingly powerful.
這就是powerful的貝葉斯定理。

For example, we can use it to solve the cookie problem. I’ll write B1 for the hypothesis that the cookie came from Bowl1 and V for the vanilla cookie. Plugging in Bayes’s theorem we get

p (B1|V) = p (B1) p (V|B1) / p (V)

舉個栗子,我們可以用這個去解決餅干問題。我們用B1來代表假說——餅干來自于碗1,V代表香草餅干。代入貝葉斯定理得到上式。

The term on the left is what we want: the probability of Bowl 1, given that we chose a vanilla cookie. The terms on the right are:

  • p (B1) : This is the probability that we chose Bowl 1, unconditioned by what kind of cookie we got. Since the problem says we chose a bowl at random, we can assume
    p (B1) =1 / 2.
  • p (V|B1) : This is the probability of getting a vanilla cookie from Bowl 1, which is 3/4.
  • p (V) : This is the probability of drawing a vanilla cookie from either bowl. Since we had an equal chance of choosing either bowl and the bowls contain the same number of cookies, we had the same chance of choosing any cookie. Between the two bowls there are 50 vanilla and 30 chocolate cookies, so p (V) = 5/8.

選擇B1碗的概率就是1/2,p (V|B1)就是事件B1下的V事件概率也就是 3/4(一共40個餅干,30個香草餅干),V事件的概率就是碗1里的香草概率加上碗2里香草的概率,也可以按照原文中作者的意思,就是選擇任何一個碗都是沒區(qū)別的。一共80個餅干,50個是香草的,所以選到香草的概率就是 5/8。

Putting it together, we have

p (B1|V) = (1 / 2) (3 / 4) = (5 / 8)

which reduces to 3/5. So the vanilla cookie is evidence in favor of the hypothesis that we chose Bowl 1, because vanilla cookies are more likely to come from Bowl 1.
所以香草餅干支持了我們的假說——我們選擇了碗1,因為香草餅干更有可能來自于碗1。

This example demonstrates one use of Bayes’s theorem: it provides a strategy to get from p (B|A) to p (A|B) . This strategy is useful in cases, like the cookie problem, where it is easier to compute the terms on the right side of Bayes’s theorem than the term on the left.
我們采用了一個策略,將p (B|A)轉(zhuǎn)換到p (A|B)。在一些案子中很有用的,當(dāng)貝葉斯定理的右側(cè)比左邊更好計算的情況下。

The diachronic interpretation

歷時解釋???
There is another way to think of Bayes’s theorem: it gives us a way to update the probability of a hypothesis, H, in light of some body of data, D.
這里還有另外一種方式思考貝葉斯定理:它給了我們一種方法去在一些數(shù)據(jù)的幫助下更新假說的可能性。

This way of thinking about Bayes’s theorem is called the diachronic interpretation. “Diachronic” means that something is happening over time; in this case the probability of the hypotheses changes, over time, as we see new data.
這種思考貝葉斯原理的方法被稱為歷時解釋。歷時的意思是經(jīng)過一段時間,一些事情改變了。在這個例子中,隨著我們看到新的數(shù)據(jù),假說的可能性也發(fā)生了改變。

Rewriting Bayes’s theorem with H and D yields:

p (H|D) = p (H) p (D|H) / p (D)

In this interpretation, each term has a name:

  • p (H) is the probability of the hypothesis before we see the data, called the prior probability, or just prior.
  • p (H|D) is what we want to compute, the probability of the hypothesis after we see the data, called the posterior.
  • p (D|H) is the probability of the data under the hypothesis, called the likelihood.
    ? p (D) is the probability of the data under any hypothesis, called the normalizing constant.

p (H)是我們看到數(shù)據(jù)之前的假說的概率,被稱為先驗概率。
p (H|D)是我們想去計算的,在我們看到新的數(shù)據(jù)后的假說的概率,我們稱之為后驗概率。。。或者就叫修正概率。
p (D|H)是假說下數(shù)據(jù)的概率,稱之為可能性。
p (D)是任意假說下數(shù)據(jù)的可能性,稱之為一般常量。

Sometimes we can compute the prior based on background information. For example, the cookie problem specifies that we choose a bowl at random with equal probability.
有時,我們可以計算基于背景信息的先驗概率,舉個栗子,餅干問題指出我們以相同的可能性選擇任意一個碗。

In other cases the prior is subjective; that is, reasonable people might disagree, either because they use different background information or because they interpret the same information differently.
在有的情況下,先驗概率是主觀的。也就是,客觀的人可能不同意,或者因為他們選取不同的背景信息,或者是因為他們以不同的方式去解釋同一個信息。

The likelihood is usually the easiest part to compute. In the cookie problem, if we know which bowl the cookie came from, we find the probability of a vanilla cookie by counting.
可能性一般是最容易去計算的部分。在餅干問題中,如果我們知道了選的是哪個碗,我們可以計算到香草餅干的可能性。

The normalizing constant can be tricky. It is supposed to be the probability of seeing the data under any hypothesis at all, but in the most general case it is hard to nail down what that means.
一般常量比較難以處理。它可以是在任意假說下的數(shù)據(jù)可能性,但是在大部分的例子中,這是最難的部分。

Most often we simplify things by specifying a set of hypotheses that are
Mutually exclusive: At most one hypothesis in the set can be true, and
Collectively exhaustive:There are no other possibilities; at least one of the hypotheses has to be true.
通常我們通過指定一組假設(shè)來簡化事情
互斥:集合中至多有一個假設(shè)是真的,并且
總體上來說是詳盡的:沒有其他可能性;至少有一個假設(shè)是正確的。
上面不算很理解,往下看例子吧。

I use the word suite for a set of hypotheses that has these properties.

In the cookie problem, there are only two hypotheses—the cookie came from Bowl 1 or Bowl 2—and they are mutually exclusive and collectively exhaustive.
互斥 對立事件 —— 餅干來自于碗1或者碗2

In that case we can compute p (D) using the law of total probability, which says that if there are two exclusive ways that something might happen, you can add up the probabilities like this:

p (D) =p (B1) p (D|B1) +p (B2) p (D|B2)

在那個例子中我們可以用全概率定律計算D的概率,也就是說如果這有兩個事情可能發(fā)生的互斥的方式,我們可以像這樣加概率。這里就明白了,選擇碗1和碗2是一個對立互斥事件,也就是選擇了碗1就不能選擇碗2,但是我們需要的是選擇香草餅干的概率。那么就是在選了碗1的情況下,我們選到香草餅干的概率加上選到碗2的情況下,我們選到香草餅干的概率。

Plugging in the values from the cookie problem, we have

p (D) = (1 / 2) (3 / 4) + (1 / 2) (1 / 2) =5 / 8

which is what we computed earlier by mentally combining the two bowls.

The M&M problem

M&M’s are small candy-coated chocolates that come in a variety of colors. Mars, Inc., which makes M&M’s, changes the mixture of colors from time to time.
不算理解這是什么問題。

In 1995, they introduced blue M&M’s. Before then, the color mix in a bag of plain M&M’s was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan. Afterward it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.
那大概就是巧克力產(chǎn)品里面前后的巧克力的顏色比例的變化

Suppose a friend of mine has two bags of M&M’s, and he tells me that one is from 1994 and one from 1996. He won’t tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?
所以問題就是有一個人買了一袋94年的糖果袋和一袋96年的,我們怎么區(qū)分的問題。從每個糖果袋里面拿出來一個巧克力糖,一個是黃色,一個是綠色的,那么黃色的來自94年包的可能性有多大。

This problem is similar to the cookie problem, with the twist that I draw one sample from each bowl/bag. This problem also gives me a chance to demonstrate the table method, which is useful for solving problems like this on paper. In the next chapter we will solve them computationally.

The first step is to enumerate the hypotheses. The bag the yellow M&M came from I’ll call Bag 1; I’ll call the other Bag 2. So the hypotheses are:

  • A: Bag 1 is from 1994, which implies that Bag 2 is from 1996.
  • B: Bag 1 is from 1996 and Bag 2 from 1994.

第一步是把可能性都枚舉出來。從后面的結(jié)果看,其實這一步還是挺重要的,因為結(jié)論不能單獨而存在,只是考慮黃色的而不考慮綠色的回事不同的情況。
所以我們的假說是 袋子1是94年的,那么袋子2就是96年的,或者袋子1是96的,袋子2是94的。

Now we construct a table with a row for each hypothesis and a column for each term in Bayes’s theorem:


貝葉斯定理表.png

The first column has the priors. Based on the statement of the problem, it is reasonable to choose p (A) =p (B) =1 / 2.

The second column has the likelihoods, which follow from the information in the problem. For example, if A is true, the yellow M&M came from the 1994 bag with probability 20%, and the green came from the 1996 bag with probability 20%. Because the selections are independent, we get the conjoint probability by multiplying.
因為選擇是獨立的,所以我們可以用相乘得到聯(lián)合概率。

The third column is just the product of the previous two. The sum of this column, 270, is the normalizing constant. To get the last column, which contains the posteriors, we divide the third column by the normalizing constant.
第三格的和就是一般常量。

That’s it. Simple, right?

Well, you might be bothered by one detail. I write p (D|H) in terms of percentages, not probabilities, which means it is off by a factor of 10,000. But that cancels out when we divide through by the normalizing constant, so it doesn’t affect the result.
作者在式子上就是對百分比這個集體成了100,對結(jié)果沒有影響。

When the set of hypotheses is mutually exclusive and collectively exhaustive, you can multiply the likelihoods by any factor, if it is convenient, as long as you apply the same factor to the entire column.

The Monty Hall problem

The Monty Hall problem might be the most contentious question in the history of probability. The scenario is simple, but the correct answer is so counterintuitive that many people just can’t accept it, and many smart people have embarrassed themselves not just by getting it wrong but by arguing the wrong side, aggressively, in public.
這是一個游戲節(jié)目的問題。

Monty Hall was the original host of the game show Let’s Make a Deal. The Monty Hall problem is based on one of the regular games on the show. If you are on the show, here’s what happens:

  • Monty shows you three closed doors and tells you that there is a prize behind each door: one prize is a car, the other two are less valuable prizes like peanut butter and fake finger nails. The prizes are arranged at random.
  • The object of the game is to guess which door has the car. If you guess right, you get to keep the car.
  • You pick a door, which we will call Door A. We’ll call the other doors B and C.
  • Before opening the door you chose, Monty increases the suspense by opening either Door B or C, whichever does not have the car. (If the car is actually behind Door A, Monty can safely open B or C, so he chooses one at random.)
  • Then Monty offers you the option to stick with your original choice or switch to the one remaining unopened door.

這個游戲就是猜門后面有沒有車,如果猜對了,那么就可以把車開走。再打開你選擇的門之前,主持人會打開門B或者門C,那個沒有車的那個門,如果車在A門后,那么主持人可以很放心的打開門B和門C的任意一個。所以他就隨便選一個。然后主持人就給你一個選擇的機(jī)會,去堅持你原來的選擇或者換到另外一個沒有打開的門。

The question is, should you “stick” or “switch” or does it make no difference?
問題就是堅持或者改變會有什么不同嗎?

Most people have the strong intuition that it makes no difference. There are two doors left, they reason, so the chance that the car is behind Door A is 50%.
很多人認(rèn)為這中間并沒區(qū)別。只剩下兩扇門,所以任意一個門的幾率是50%。

But that is wrong. In fact, the chance of winning if you stick with Door A is only 1/3; if you switch, your chances are 2/3.
但是這是錯誤的。事實上,堅持獲勝的概率是1/3,而改變獲勝的概率是2/3。

By applying Bayes’s theorem, we can break this problem into simple pieces, and maybe convince ourselves that the correct answer is, in fact, correct.
運(yùn)用貝葉斯定理,我們可以將問題拆分到簡單的模塊,并且說服我們正確答案事實上是正確的。

To start, we should make a careful statement of the data. In this case D consists of two parts: Monty chooses Door B and there is no car there.
首先,我們必須明確一個數(shù)據(jù)聲明。在這個例子中,數(shù)據(jù)由兩部分組成,主持人選擇了門B,并且后面沒有車。

Next we define three hypotheses: A, B, and C represent the hypothesis that the car is behind Door A, Door B, or Door C. Again, let’s apply the table method:
接下來我們定義三個假說,A B C代表這個車在A門,B門,C門門后。下面開始制表:


貝葉斯表.png

Filling in the priors is easy because we are told that the prizes are arranged at random, which suggests that the car is equally likely to be behind any door.
先驗概率因為是隨機(jī)的,所以就是每個門的幾率都是1/3.

Figuring out the likelihoods takes some thought, but with reasonable care we can be confident that we have it right:

  • If the car is actually behind A, Monty could safely open Doors B or C. So the probability that he chooses B is 1/2. And since the car is actually behind A, the probability that the car is not behind B is 1.
  • If the car is actually behind B, Monty has to open door C, so the probability that he opens door B is 0.
  • Finally, if the car is behind Door C, Monty opens B with probability 1 and finds no car there with probability 1.

其實上面的描述過程就是:首先我們還是要明確各項的含義。prior是先驗項,也就是在新的信息發(fā)生之前,假說的概率。likelihood其實就是對應(yīng)假說成立的情況下D發(fā)生的概率。比如對于A的可能性就是,如果真的A后有車,那么B和C門后就沒有車,從而新信息B門打開且沒有車的概率就是1/2,因為可以選擇B或者C。對于B的likelihood就是不可能,因為B門后有車的情況下,主持人不會打開B門。C的likelihood的話,如果C后有車,觀眾選擇的是A,主持人打開B門的概率就是100%了。其實主要就是理解這一項的含義就是“在假說對應(yīng)的N門后有車的情況下,主持人打開B的概率”。

Now the hard part is over; the rest is just arithmetic. The sum of the third column is 1/2. Dividing through yields p (A| D) =1 / 3 and p (C|D) =2 / 3. So you are better off switching.
從結(jié)果看,我們最好換一下。

There are many variations of the Monty Hall problem. One of the strengths of the Bayesian approach is that it generalizes to handle these variations.
汽車門問題有很多變種。

For example, suppose that Monty always chooses B if he can, and only chooses C if he has to (because the car is behind B). In that case the revised table is:


貝葉斯表.png

The only change is p (D|A) . If the car is behind A, Monty can choose to open B or C. But in this variation he always chooses B, so p (D|A) =1.
唯一的不同就在于p (D|A),也就是車在A門后面,主持人打開B門的概率,因為主持人不會打開C門,所以此時的概率不是1/2而是1。

As a result, the likelihoods are the same for A and C, and the posteriors are the same: p (A|D) =p (C|D) =1 / 2. In this case, the fact that Monty chose B reveals no information about the location of the car, so it doesn’t matter whether the contestant sticks or switches.
這句話其實表達(dá)的很好就是主持人選擇B門并沒有帶來更多的信息。帶有信息的選擇才有可能昭示事情的實際概率。

On the other hand, if he had opened C, we would know
p (B|D) =1.
但是從另外一方面,如果他打開了C那么我們就知道車在B門后面了。

I included the Monty Hall problem in this chapter because I think it is fun, and because Bayes’s theorem makes the complexity of the problem a little more manageable. But it
is not a typical use of Bayes’s theorem, so if you found it confusing, don’t worry!

Discussion

For many problems involving conditional probability, Bayes’s theorem provides a divide-and-conquer strategy. If p (A|B) is hard to compute, or hard to measure experimentally, check whether it might be easier to compute the other terms in Bayes’s theorem,
p (B|A) , p (A) and p (B).
實際上貝葉斯思維采取的是一種分而治之的策略。

If the Monty Hall problem is your idea of fun, I have collected a number of similar problems in an article called “All your Bayes are belong to us,” which you can read at http://allendowney.blogspot.com/2011/10/all-your-bayes-are-belong-to-us.html.

作者很友好的附上了習(xí)題集。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容