均數(shù)差置信區(qū)間
問題:
1. 對于10,000次迭代,自展法(bootstrap)會對你的樣本數(shù)據(jù)進行抽樣,計算喝咖啡和不喝咖啡的人的平均身高的差異。使用你的抽樣分布建立一個99%的置信區(qū)間。根據(jù)你的區(qū)間開始回答下面的第一個測試題目。
2. 對于10,000次迭代,自展法會對樣本數(shù)據(jù)進行抽樣,計算21歲以上和21歲以下的平均身高的差異。使用你的抽樣分布構(gòu)建一個99%的置信區(qū)間。根據(jù)你的區(qū)間來完成回答下面的第一個測試題目。
3. 對于10,000次迭代,自展法會對你的樣本數(shù)據(jù)進行抽樣,計算出21歲 以下 個體的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之間的 差異 。使用你的抽樣分布,建立一個95%的置信區(qū)間。根據(jù)你的區(qū)間來回答下面的第二個測試題目。
4. 對于10,000次迭代,自展法會對你的樣本數(shù)據(jù)進行抽樣,計算出21歲 以上 個體的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之間的 差異 。使用你的抽樣分布,建立一個95%的置信區(qū)間。根據(jù)你的區(qū)間來回答下面的第二個測試題目以及下列問題。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(42)
full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()
- For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below.
diffs = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
diffs.append(coff_mean - nocoff_mean)
np.percentile(diffs, 0.5), np.percentile(diffs, 99.5)
# statistical evidence coffee drinkers are on average taller
plt.hist(diffs)
- For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below.
diffs_age = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
diffs_age.append(over21_mean - under21_mean)
np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
# statistical evidence that over21 are on average taller
# diffs_coff_under211=[]
for _ in range(10000):
bootsamp=sample_data.sample(200,replace=True)
under21_coff_mean=bootsamp[bootsamp['age']]
- For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.
diffs_coff_under21 = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)
np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller
- For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.
diffs_coff_over21 = []
for _ in range(10000):
bootsamp = sample_data.sample(200, replace = True)
over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)
np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
# For the over21 group, we have evidence that on average the non-coffee drinkers are taller