在小馬哥課堂-統(tǒng)計(jì)學(xué)-中心極限定理一節(jié)的例子中提到一個(gè)標(biāo)準(zhǔn)誤差的概念,有同學(xué)對(duì)此不清楚,所以這里單獨(dú)寫一節(jié),來(lái)對(duì)standard error進(jìn)行闡述,希望能大家能有一個(gè)直觀的理解。
Standard error(標(biāo)準(zhǔn)誤差)
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution.If the parameter or the statistic is the mean, it is called the standard error of the mean (SEM).
The sampling distribution of a population mean is generated by repeated sampling and recording of the means obtained. This forms a distribution of different means, and this distribution has its own mean and variance. Mathematically, the variance of the sampling distribution obtained is equal to the variance of the population divided by the sample size. This is because as the sample size increases, sample means cluster more closely around the population mean.
Therefore, the relationship between the standard error and the standard deviation is such that, for a given sample size, the standard error equals the standard deviation divided by the square root of the sample size. In other words, the standard error of the mean is a measure of the dispersion of sample means around the population mean.
標(biāo)準(zhǔn)誤差,通常是指 某個(gè)統(tǒng)計(jì)量(一般是某個(gè)分布的參數(shù)估計(jì),例如正態(tài)分布的參數(shù)的估計(jì))的標(biāo)準(zhǔn)誤差,即抽樣分布的標(biāo)準(zhǔn)差。
對(duì)總體進(jìn)行樣本容量為n的抽樣,樣本容量為n,反復(fù)進(jìn)行抽樣,那么"每個(gè)樣本"的均值 形成一個(gè)分布,該分布有自己的期望和方差。數(shù)學(xué)上,抽樣分布的方差等于 總體方差除以樣本容量。隨著樣本容量的增大,樣本均值越來(lái)越接近于總體均值。因此,標(biāo)準(zhǔn)差和標(biāo)準(zhǔn)誤的關(guān)系是:給定樣本容量n,標(biāo)準(zhǔn)誤等于 標(biāo)準(zhǔn)差除以 樣本容量的平方根。換而言之,樣本均值的標(biāo)準(zhǔn)誤是衡量 樣本均值和總體均值的離散程度。
我們知道,方差是衡量 隨機(jī)變量與其期望的離散程度;
又因?yàn)?,樣本均值的?biāo)準(zhǔn)誤是衡量 樣本均值和總體均值的離散程度;
所以,我們將 樣本均值 看成是一個(gè) 隨機(jī)變量,那么,標(biāo)準(zhǔn)誤就是 隨機(jī)變量
的標(biāo)準(zhǔn)差。概括言之(抽象成更一般的情況),標(biāo)準(zhǔn)誤是抽樣分布的標(biāo)準(zhǔn)差。
Population
The standard error of the mean (SEM) can be expressed as:
where
σ is the standard deviation of the population.
n is the size (number of observations) of the sample.
Estimate
Since the population standard deviation is seldom known, the standard error of the mean is usually estimated as the sample standard deviation divided by the square root of the sample size (assuming statistical independence of the values in the sample).
where
s is the sample standard deviation (i.e., the sample-based estimate of the standard deviation of the population), and
n is the size (number of observations) of the sample.
代碼示例
#!/usr/bin/env python3
#-*- coding:utf-8 -*-
#############################################
#File Name: standard_error.py
#Brief: 直觀上演示 標(biāo)準(zhǔn)誤差公式 的正確性
#Author: frank
#Email: frank0903@aliyun.com
#Created Time:2018-08-09 20:29:10
#Blog: http://www.cnblogs.com/black-mamba
#Github: https://github.com/xiaomagejunfu0903/statistic_notes
#############################################
import random
import matplotlib.pyplot as plt
import numpy as np
n=10000
#list_population=list(np.random.normal(size=n))
list_population = list(np.random.randint(low=1,high=7,size=n))
#print("list_population:{},len:{}".format(list_population,len(list_population)))
#總體期望
mean_population=np.mean(list_population)
print("mean_population: %.6f"%mean_population)
#總體標(biāo)準(zhǔn)差
sigma=np.std(list_population,ddof=0)
print("standard deviation of population:{}".format(sigma))
#顯示總體分布
plt.figure(1)
n,bins,patches = plt.hist(list_population,bins='auto',density=1)
y_population = ((1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-0.5 * (1 / sigma * (bins - mean_population))**2))
plt.plot(bins, y_population, 'r--')
plt.title('population distribution')
text_comment = "$\mu={},\ \sigma={}$".format(mean_population,sigma)
plt.text(1, .5, text_comment,{'color':'r','fontsize':15})
#抽樣分布
#獲取standard error of the mean
def get_SEM(list_population, simple_size, sampling_times):
#進(jìn)行 容量為simple_size的樣本 抽樣,抽樣次數(shù)為sampling_times
for i in range(sampling_times):
samples=random.sample(list_population,simple_size)
#print("samples:{}".format(samples))
sampling_mean = np.mean(samples)
#print("sampling mean:{}".format(sampling_mean))
list_sampling_mean.append(sampling_mean)
print("size of list_sampling_mean:{}".format(len(list_sampling_mean)))
sampling_sd = np.std(list_sampling_mean,ddof=0)
print("standard deviation of the sampling mean:{}".format(sampling_sd))
return sampling_sd
#樣本容量
simple_size = 10
#抽樣次數(shù)
sampling_times = 1000
#樣本均值list
list_sampling_mean = []
print("理論標(biāo)準(zhǔn)誤:{}".format(sigma/np.sqrt(simple_size)))
sampling_sd = get_SEM(list_population, simple_size, sampling_times)
plt.figure(2)
n,bins,patches = plt.hist(list_sampling_mean,bins='auto',density=1)
y_population = ((1 / (np.sqrt(2 * np.pi) * sampling_sd)) * np.exp(-0.5 * (1 / sampling_sd * (bins - np.mean(list_sampling_mean)))**2))
plt.plot(bins, y_population, 'r--')
plt.title('sample distribution of the sample mean')
text_comment = "real $\mu={0:},\ \sigma={1:}$".format(np.mean(list_sampling_mean),sampling_sd)
plt.text(2.0, 0.4, text_comment,{'color':'r','fontsize':15})
text_comment = "theoretical standard error of the mean:{}".format(sigma/np.sqrt(simple_size))
plt.text(2.0, 0.8, text_comment,{'color':'m','fontsize':15})
plt.show()


從上面的結(jié)果可以看出,抽樣分布的方差等于 總體方差除以樣本容量,而且隨著樣本容量和抽樣次數(shù)的增加,標(biāo)準(zhǔn)誤的值越來(lái)越小,即越接近總體方差。