R語言--20/21/22

cor相關(guān)性:

cor():
cor(x, y , method = c("pearson", "kendall", "spearman"))
計算x,y的相關(guān)性,x,y需要為向量,若為矩陣則計算x列,y列的相關(guān)性。method指定相關(guān)性計算的方法,默認情況下是Pearson相關(guān)性系數(shù)。
Pearson,spearman計算方法見筆記本
cor.test():


cor.test()使用方法.png

cor()只給出相關(guān)性值,cor.test()同時給出檢驗的p值。

lm()線性擬合

lm()函數(shù)返回擬合結(jié)果的對象,可以用summary()函數(shù)查看其內(nèi)容

> test
      height weight gender      BMI
tom      180     75   male 23.14815
cindy    165     58 female 21.30395
jimmy    175     72   male 23.51020
sam      173     68   male 22.72044
lucy     160     60 female 23.43750
lily     165     55 female 20.20202> result=lm(height~weight,data=test)
> result

Call:
lm(formula = height ~ weight, data = test)

Coefficients:
(Intercept)       weight  
     115.34         0.84  
#擬合方程為height=115.34+0.84weight
> summary(result)

Call:
lm(formula = height ~ weight, data = test)

Residuals:
    tom   cindy   jimmy     sam    Lucy    lily 
 1.6529  0.9336 -0.8270  0.5332 -5.7465  3.4537 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)#Pr(>|t|)為顯著性
(Intercept) 115.3441    12.5825   9.167 0.000786 ***
weight        0.8400     0.1933   4.346 0.012198 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.519 on 4 degrees of freedom
Multiple R-squared:  0.8252,    Adjusted R-squared:  0.7815 
#擬合優(yōu)度和矯正后的擬合優(yōu)度,此項數(shù)值越大越好
F-statistic: 18.89 on 1 and 4 DF,  p-value: 0.0122
#檢驗整個方程的p值,<0.05為宜

檢驗一套數(shù)據(jù)是否符合正態(tài)分布

符合正態(tài)分布的數(shù)據(jù)情況

法一

>data=rnorm(1000)
> hist(data,prob=T)
> lines(density(data))
輸出結(jié)果.png

繪制數(shù)據(jù)的統(tǒng)計圖,大致符合鐘形分布即可大致判斷符合正態(tài)分布
法二

qqnorm(data)
qqline(data)


截屏2021-06-28 下午4.21.52.png

直線大致滿足y=x即可判斷為符合正態(tài)分布
法三

> shapiro.test(data)

    Shapiro-Wilk normality test

data:  data
W = 0.99786, p-value = 0.229
#>0.05符合正態(tài)分布,<0,05不符合正態(tài)分布
不符合正態(tài)分布的數(shù)據(jù)情況

法一

> a=c(rep(1,10),rep(2,5),rep(3,4),6,8,10,12,20)
> hist(a,breaks=seq(0.5,21,by=1),prob=TRUE)
> lines(density(a),col="blue")#明顯看出是偏態(tài)分布
> abline(v=median(a),col="red")#添加中值,紅色線
> abline(v=mean(a),col="green")#添加平均數(shù),綠色線
> #對于偏態(tài)分布,中值比平均數(shù)更有意義
法一輸出結(jié)果

法二

> qqnorm(a)
> qqline(a)
法二輸出結(jié)果

明顯為偏態(tài)分布
法三

> shapiro.test(a)

    Shapiro-Wilk normality test

data:  a
W = 0.63575, p-value = 1.589e-06
#pvalue<0.05,偏態(tài)分布

不符合正態(tài)分布的數(shù)據(jù),不能使用t檢驗,應(yīng)該使用秩和檢驗。

> a=c(rep(1,10),rep(2,5),rep(3,4),6,8,10,12,20)
> b=c(rep(2,7),rep(3,5),rep(5,8),8,10,18,25)
#a,b不符合正態(tài)分布
> t.test(a,b)

    Welch Two Sample t-test

data:  a and b
t = -1.2025, df = 44.761, p-value = 0.2355
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.681665  1.181665
sample estimates:
mean of x mean of y 
 3.666667  5.416667 
#不符合正態(tài)分布的數(shù)據(jù),不能用T檢驗,此時P value>0.05,無顯著差異
> wilcox.test(a,b,exact = FALSE)#秩和檢驗,非參數(shù)檢驗

    Wilcoxon rank sum test with continuity correction

data:  a and b
W = 162.5, p-value = 0.008673
alternative hypothesis: true location shift is not equal to 0
#不符合正態(tài)分布的數(shù)據(jù),用秩和檢驗,此時P value<0.05,有顯著差異

百分比檢驗

假設(shè)我們知道全球流感對死亡率是10%,在美國對調(diào)查發(fā)現(xiàn)400名流感病人中有51人死亡了,請問美國流感對死亡率是否顯著高于全球流感對死亡率。

> prop.test(51,400,p=0.1,alternative = "greater")

    1-sample proportions test with continuity correction

data:  51 out of 400, null probability 0.1
X-squared = 3.0625, df = 1, p-value = 0.04006
alternative hypothesis: true p is greater than 0.1
95 percent confidence interval:
 0.101422 1.000000
sample estimates:
     p 
0.1275 

卡方檢驗和fisher精確檢驗

卡方檢驗例子——判斷吸煙和不吸煙患氣管炎是否有顯著差異PNG
> data=rbind(c(50,250),c(8,10))
> rownames(data)=c("cig","non-cig")
> colnames(data)=c("qiguanyan","non-qiguanyan")
> chisq.test(data)#卡方檢驗,注意輸入的值直接為數(shù)據(jù)框

    Pearson's Chi-squared test with Yates' continuity correction

data:  data
X-squared = 7.0225, df = 1, p-value = 0.008049

Warning message:
In chisq.test(data) : Chi-squared approximation may be incorrect
#此時會產(chǎn)生報錯,產(chǎn)生此項錯誤的原因是吸煙患者樣本數(shù)太少,出現(xiàn)此類報錯,可用下文的fisher精確檢驗做統(tǒng)計檢驗
> fisher.test(data)#fisher精確檢驗

    Fisher's Exact Test for Count Data

data:  data
p-value = 0.007489
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.08452923 0.77224505
sample estimates:
odds ratio 
   0.25151 
卡方檢驗例子——判斷男性女性癌癥的不同階段的人數(shù)是否有差異.PNG

方差分析示例

step1:大致查看數(shù)據(jù)

> cholesterol
      trt response
1   1time   3.8612
2   1time  10.3868
3   1time   5.9059
4   1time   3.0609
5   1time   7.7204
6   1time   2.7139
7   1time   4.9243
8   1time   2.3039
9   1time   7.5301
10  1time   9.4123
11 2times  10.3993
12 2times   8.6027
13 2times  13.6320
14 2times   3.5054
15 2times   7.7703
16 2times   8.6266
17 2times   9.2274
18 2times   6.3159
19 2times  15.8258
20 2times   8.3443
21 4times  13.9621
22 4times  13.9606
23 4times  13.9176
24 4times   8.0534
25 4times  11.0432
26 4times  12.3692
27 4times  10.3921
28 4times   9.0286
29 4times  12.8416
30 4times  18.1794
31  drugD  16.9819
32  drugD  15.4576
33  drugD  19.9793
34  drugD  14.7389
35  drugD  13.5850
36  drugD  10.8648
37  drugD  17.5897
38  drugD   8.8194
39  drugD  17.9635
40  drugD  17.6316
41  drugE  21.5119
42  drugE  27.2445
43  drugE  20.5199
44  drugE  15.7707
45  drugE  22.8850
46  drugE  23.9527
47  drugE  21.5925
48  drugE  18.3058
49  drugE  20.3851
50  drugE  17.3071
> boxplot(response~trt,data=cholesterol)
輸出結(jié)果.png

大致可以看出不同處理間y是有差異的。

step2:檢查正態(tài)分布及方差齊性

> shapiro.test(cholesterol$response)
#檢驗是否為正態(tài)分布,>0.05沒有顯著差異,為正態(tài)分布

    Shapiro-Wilk normality test

data:  cholesterol$response
W = 0.97722, p-value = 0.4417

> bartlett.test(response~trt,data=cholesterol)
#檢查方差齊性,>0.05方差沒有顯著差異。

    Bartlett test of homogeneity of variances

data:  response by trt
Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653

step3方差分析:

> fit<-aov(response~trt,data=cholesterol)
> summary(fit)
            Df Sum Sq Mean Sq F value   Pr(>F)    
trt          4 1351.4   337.8   32.43 9.82e-13 ***
Residuals   45  468.8    10.4                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Pr(>F)  即p值小于0.001,極顯著,即處理組間有差異,可以用于事后多重比較

step4:事后多重比較

> TukeyHSD(fit)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = response ~ trt, data = cholesterol)

$trt
                  diff        lwr       upr     p adj
2times-1time   3.44300 -0.6582817  7.544282 0.1380949
4times-1time   6.59281  2.4915283 10.694092 0.0003542
drugD-1time    9.57920  5.4779183 13.680482 0.0000003
drugE-1time   15.16555 11.0642683 19.266832 0.0000000
4times-2times  3.14981 -0.9514717  7.251092 0.2050382
drugD-2times   6.13620  2.0349183 10.237482 0.0009611
drugE-2times  11.72255  7.6212683 15.823832 0.0000000
drugD-4times   2.98639 -1.1148917  7.087672 0.2512446
drugE-4times   8.57274  4.4714583 12.674022 0.0000037
drugE-drugD    5.58635  1.4850683  9.687632 0.0030633
#diff:兩組平均值之間的差異
#lwr,upr:置信區(qū)間的上下端點為95%(默認值)
#p adj:調(diào)整后的多個比較的p值。
#如本例,除了2times-1time,2times-1time,drugD-4times沒有顯著差異,其余組別均有顯著差異

step5 圖例展示

> library(multcomp) 
> par(mar=c(5,4,6,2))
> tuk<-glht(fit,linfct=mcp(trt="Tukey"))
> plot(cld(tuk,level=0.05),col="lightgreen")
輸出結(jié)果

有相同字母的組別表示差異不顯著

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容