3.3 概率分布
3.3.1 離散型隨機變量
若隨機變量的取值為有限個或可列個,則稱此隨機變量為離散型(discrete)隨機變量,簡稱離散量。
比如你拋擲一枚硬幣兩次,那么結(jié)果只有4種可能性:
HH,HT,TH和TT(H:正面;T:反面)
如果用一個隨機變量X表示該試驗中出現(xiàn)H結(jié)果的次數(shù),那么X只有0,1,2三種可能。因此,X為離散型隨機變量。具體地:
P(X=0)=0.25
P(X=1)=0.5
P(X=2)=0.25
P(X):Probability Distribution Function(PDF) of variable X 為X的概率分布律,滿足下列性質(zhì):

3.3.2 連續(xù)型隨機變量
對于隨機變量X,若存在一個非負的實函數(shù)f(x),使X落在任意區(qū)域D上的概率
則稱為X的連續(xù)型隨機變量,簡稱連續(xù)量,稱f(x)為X的概率密度函數(shù),簡稱密度。
由定義知,密度函數(shù)具有以下性質(zhì):
(1)f(x)≥0
(2)

(3)

離散型變量和連續(xù)型變量的總結(jié):

Mean and variance for discrete variable with a given PDF



3.3.3 0-1(p)分布

E(X)=1×p+0×(1-p)=p
Var(X)=E(X2)-(E(X))2=(12×p+02×(1-p))-p2=p-p2=p(1-p)
3.3.4 貝努里分布 Bernoulli distribution
定義:在n次獨立重復(fù)的試驗中,每次試驗都只有兩個結(jié)果:A,A‘,且每次試驗中A發(fā)生的概率不變,記P(A)=p,0<p<1,稱這一系列試驗為n重貝努里(Bernoulli)試驗。
在n重貝努里試驗中,若記事件A發(fā)生的概率為P(A)=p,0<p<1,設(shè)X為在n次試驗中A發(fā)生的次數(shù),則:

E(x)=E(x1+x2+...+xn)=E(x1)+E(x2)+...+E(xn)=p+p+...+p=np
Var(x)=Var(x1+x2+...+xn)=Var(x1)+Var(x2)+...+Var(xn)=p(1-p)+p(1-p)+...+p(1-p)=np(1-p)
Example of a Binomial distribution
When a fair coin is flipped, the probability of it being Head or Tail is the same, i.e.,p=0.5.
If we flip the coin 5 times, what is the probability of having 5 Head?
Answer.png
Example of a Binomial distribution
After a genome wide Chip-seq experiment, a transcription factor was found to bind to the promoter region of 100 genes(out of 26,000). Now, if we do another experiment with a second TF and identify also 100 genes, what is the probability of finding at least 5 of them with the first TF binding site?
Suppose the first TF binds to gene without any preference, then the probability of a gene randomly selected from the genome that is bound by the first TF is 100/26000=0.039
For a given gene, it is either bound by the first TF('success') or not ('failure'),i.e.,a Bernoulli trail.
If the second TF is independent of the first TF, then the number of genes bound by the second TF that are also bound by the first TF follows a binomial distribution.
Binomial distribution:n=100,p=0.0039
P(k=0)=0.6765408
P(k=1)=0.2648840
P(k=2)=0.05133606
P(k=3)=0.006565821
P(k=4)=0.0006233937
P(k>=5)=1-P(k=0)-P(k=1)-P(k=2)-P(k=3)-P(k=4)=4.992756e-05
3.3.5 負貝努里分布 Negative Binomial distribution
定義:實驗包含一系列獨立的試驗,每個試驗都有成功、失敗兩種結(jié)果,成功的概率p是恒定的,實現(xiàn)持續(xù)到r次成功,r為正整數(shù)。滿足上述條件的稱為負貝努里分布。

Mean and Variance of Negative Binomial Distribution


Alternative formulation of Negative Binomial distribution

Example of negative binomial distribution
If a predator must capture 10 prey before it can grow large enough to reproduce, what would the mean age of onset of reproduction be if the probability of capturing a prey on any given day is 0.1?Answer.png
The expected time is 100 days. However, the variance is quite high (900) and that the distribution looks quite skewed. Some predators will reach reproductive age much sooner and some much later than the average.
3.3.6 幾何分布 Geometric distribution
定義:在n次貝努里試驗中,試驗k次才得到第一次成功的機率。即,前k-1次皆失敗,第k次成功的概率。

Example of geometric distribution
If the probability of extinction of an endangered population is estimated to be 0.1 every year, what is the expected time until extinction?Answer.png
The expected time is 10 year. However, because of large variance, it will be difficult to predict the actual year in which the population go to extinct accurately.


