亚洲人妻系,在线免费视频一区二区

$\theta$ - Quantity of interest or parameter of a model

Frequentist: $\theta$ is described as unknown and deterministic
Example: Throwing a coin for 100 times, there are 60 times that coin head appears, then $P(head)=\theta =\frac{60}{100}=0.6$ . When sample data approaches infinity, this method from frequentist is able to give an accurate estimation of $\theta$ . However, if sample data is very scarce, then severe bias could occur. To conclude, more data, more accurate estimation of $\theta$ with frequentist method.
Bayesian: $\theta$ is described as random. There are two inputs: prior $P(\theta)$ , likelihood (似然) $P(X|\theta)$ . There is one output: Posterior $P(\theta|X)$
Bayesian estimation is based on Bayesian rule:
$P(\theta|X)=\frac{P(X|\theta) \times P(\theta)}{P(X)}$
Therefore, we have $posterior \propto likelihood \times prior$ . Because, $X$ as observation (or sample data) is given as a condition in $P(\theta|X)$ , so $P(X)=const$ .
Example: Considering this example of flipping a coin again. $P(head)=P(\theta|X)$ is a distribution, instead of a deterministic value of 0.6 in this example. With the increase of sample data, $P(\theta|X)$ trusts measurement more than prior.
Note: if prior $P(\theta)$ is uniform distribution, then Bayesian method is equal to frequentist method.
Maximum likelihood Estimation (MLE) - frequentist method
A given set of observations, random sample data $X=(x_1, x_2, ..., x_n)$ , which is independent and identical distribution. The estimation of $\theta$ using MLE method can be expressed below:
$\begin{align} \hat{\theta}_{MLE} &= arg \ max \ P(X;\theta)\\ &= arg \ max \ P(x_1;\theta)P(x_2;\theta)...P(x_n;\theta)\\ &= arg \ max \log \prod \limits_{i=1}^n P(x_i;\theta)\\ &= arg \ max \sum_{i=1}^n \log P(x_i;\theta)\\ &= arg \ min -\sum_{i=1}^n \log P(x_i;\theta) \end{align}$
The last line of above equation is called Negative Log likelihood (NLL)
Maximum A Posteriori (MAP) - Bayesian method
A given set of observations, random sample data $X=(x_1, x_2, ..., x_n)$ , which is independent and identical distribution. The estimation of $\theta$ using MAP method can be expressed below:
$\begin{align} \hat{\theta}_{MAP} &= arg \ max \ P(\theta|X)\\ &= arg \ min - \log P(\theta|X)\\ &= arg \ min - \log P(X|\theta) - \log P(\theta) + log P(X)\\ &= arg \ min - \log P(X|\theta) - \log P(\theta) \end{align}$
Given that prior is a Gaussian distribution:
$P(\theta) = const \times e^{(-\frac{\theta^2}{2\sigma^2})}$
Then
$- \log P(\theta) = const +\frac{\theta^2}{2\sigma^2}$