Frequentist vs. Bayesian (Maximum Likelihood Estimation vs. Maximum A Posteriori )

\theta - Quantity of interest or parameter of a model

  • Frequentist: \theta is described as unknown and deterministic
    Example: Throwing a coin for 100 times, there are 60 times that coin head appears, then P(head)=\theta =\frac{60}{100}=0.6. When sample data approaches infinity, this method from frequentist is able to give an accurate estimation of \theta. However, if sample data is very scarce, then severe bias could occur. To conclude, more data, more accurate estimation of \theta with frequentist method.

  • Bayesian: \theta is described as random. There are two inputs: prior P(\theta), likelihood (似然) P(X|\theta). There is one output: Posterior P(\theta|X)
    Bayesian estimation is based on Bayesian rule:
    P(\theta|X)=\frac{P(X|\theta) \times P(\theta)}{P(X)}
    Therefore, we have posterior \propto likelihood \times prior. Because, X as observation (or sample data) is given as a condition in P(\theta|X), so P(X)=const.
    Example: Considering this example of flipping a coin again. P(head)=P(\theta|X) is a distribution, instead of a deterministic value of 0.6 in this example. With the increase of sample data, P(\theta|X) trusts measurement more than prior.
    Note: if prior P(\theta) is uniform distribution, then Bayesian method is equal to frequentist method.

  • Maximum likelihood Estimation (MLE) - frequentist method
    A given set of observations, random sample data X=(x_1, x_2, ..., x_n), which is independent and identical distribution. The estimation of \theta using MLE method can be expressed below:
    \begin{align} \hat{\theta}_{MLE} &= arg \ max \ P(X;\theta)\\ &= arg \ max \ P(x_1;\theta)P(x_2;\theta)...P(x_n;\theta)\\ &= arg \ max \log \prod \limits_{i=1}^n P(x_i;\theta)\\ &= arg \ max \sum_{i=1}^n \log P(x_i;\theta)\\ &= arg \ min -\sum_{i=1}^n \log P(x_i;\theta) \end{align}
    The last line of above equation is called Negative Log likelihood (NLL)

  • Maximum A Posteriori (MAP) - Bayesian method
    A given set of observations, random sample data X=(x_1, x_2, ..., x_n), which is independent and identical distribution. The estimation of \theta using MAP method can be expressed below:
    \begin{align} \hat{\theta}_{MAP} &= arg \ max \ P(\theta|X)\\ &= arg \ min - \log P(\theta|X)\\ &= arg \ min - \log P(X|\theta) - \log P(\theta) + log P(X)\\ &= arg \ min - \log P(X|\theta) - \log P(\theta) \end{align}
    Given that prior is a Gaussian distribution:
    P(\theta) = const \times e^{(-\frac{\theta^2}{2\sigma^2})}
    Then
    - \log P(\theta) = const +\frac{\theta^2}{2\sigma^2}

https://www.sohu.com/a/215176689_610300

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀(guān)點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容