均數(shù)差

均數(shù)差置信區(qū)間
問題:
1. 對于10,000次迭代,自展法(bootstrap)會對你的樣本數(shù)據(jù)進行抽樣,計算喝咖啡和不喝咖啡的人的平均身高的差異。使用你的抽樣分布建立一個99%的置信區(qū)間。根據(jù)你的區(qū)間開始回答下面的第一個測試題目。

2. 對于10,000次迭代,自展法會對樣本數(shù)據(jù)進行抽樣,計算21歲以上和21歲以下的平均身高的差異。使用你的抽樣分布構(gòu)建一個99%的置信區(qū)間。根據(jù)你的區(qū)間來完成回答下面的第一個測試題目。

3. 對于10,000次迭代,自展法會對你的樣本數(shù)據(jù)進行抽樣,計算出21歲 以下 個體的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之間的 差異 。使用你的抽樣分布,建立一個95%的置信區(qū)間。根據(jù)你的區(qū)間來回答下面的第二個測試題目。

4. 對于10,000次迭代,自展法會對你的樣本數(shù)據(jù)進行抽樣,計算出21歲 以上 個體的喝咖啡的人的平均身高和不喝咖啡的人的平均身高之間的 差異 。使用你的抽樣分布,建立一個95%的置信區(qū)間。根據(jù)你的區(qū)間來回答下面的第二個測試題目以及下列問題。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.random.seed(42)

full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
sample_data.head()
  1. For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for coffee and non-coffee drinkers. Build a 99% confidence interval using your sampling distribution. Use your interval to start answering the first quiz question below.
diffs = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
    nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
    diffs.append(coff_mean - nocoff_mean)
 
np.percentile(diffs, 0.5), np.percentile(diffs, 99.5) 
# statistical evidence coffee drinkers are on average taller
plt.hist(diffs)
  1. For 10,000 iterations, bootstrap sample your sample data, compute the difference in the average heights for those older than 21 and those younger than 21. Build a 99% confidence interval using your sampling distribution. Use your interval to finish answering the first quiz question below.
diffs_age = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_mean = bootsamp[bootsamp['age'] == '<21']['height'].mean()
    over21_mean = bootsamp[bootsamp['age'] != '<21']['height'].mean()
    diffs_age.append(over21_mean - under21_mean)

np.percentile(diffs_age, 0.5), np.percentile(diffs_age, 99.5)
# statistical evidence that over21 are on average taller
# diffs_coff_under211=[]
for _ in range(10000):
    bootsamp=sample_data.sample(200,replace=True)
    under21_coff_mean=bootsamp[bootsamp['age']]

  1. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to start answering question 2 below.
diffs_coff_under21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    under21_coff_mean = bootsamp.query("age == '<21' and drinks_coffee == True")['height'].mean()
    under21_nocoff_mean = bootsamp.query("age == '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_under21.append(under21_nocoff_mean - under21_coff_mean)

np.percentile(diffs_coff_under21, 2.5), np.percentile(diffs_coff_under21, 97.5)
# For the under21 group, we have evidence that the non-coffee drinkers are on average taller
  1. For 10,000 iterations bootstrap your sample data, compute the difference in the average height for coffee drinkers and the average height non-coffee drinkers for individuals under 21 years old. Using your sampling distribution, build a 95% confidence interval. Use your interval to finish answering the second quiz question below. As well as the following questions.
diffs_coff_over21 = []
for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    over21_coff_mean = bootsamp.query("age != '<21' and drinks_coffee == True")['height'].mean()
    over21_nocoff_mean = bootsamp.query("age != '<21' and drinks_coffee == False")['height'].mean()
    diffs_coff_over21.append(over21_nocoff_mean - over21_coff_mean)
np.percentile(diffs_coff_over21, 2.5), np.percentile(diffs_coff_over21, 97.5)
# For the over21 group, we have evidence that on average the non-coffee drinkers are taller
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,854評論 0 10
  • 一 一直以來,我都覺得自己是個稱職的媽媽,雖說不上溫柔賢惠,可是對兒子我傾盡所能,...
    飛絮冰楠閱讀 722評論 0 3
  • 愛有許多種表現(xiàn)方式,有嚴愛、寵愛、溺愛、慈愛等。主要感受到了兩種愛——嚴愛與慈愛。 我特別喜愛看書...
    小竹筍的天空閱讀 525評論 0 0
  • 這真不是一句罵人的話,而是一夜的體驗。昨天晚上冷,凍醒熟睡的我,兩回?。?! 一回我真滾來著,但一個睡得眼都不開的人...
    一念做去閱讀 1,777評論 2 1
  • 臨摹董大樹的彩鉛少女
    蹲下來仰望閱讀 157評論 0 2

友情鏈接更多精彩內(nèi)容