久久免费露脸丝袜国产,亚洲熟女久久,久久一区人妻黑

本文簡單介紹幾種重抽樣方法 (in R)。

Permutation
Bootstrap
Jackknife
Cross validation

我們生成一組數(shù)據(jù)，其中x是我們的觀測值，y是我們對其的標簽。

# generate random data
set.seed(1111)
x <- c(rnorm(10), rnorm(10, mean=5, sd=5))
y <- c(rep("A", 10), rep("B", 10))
df1 <- data.frame(x,y)
str(df1)

## 'data.frame':    20 obs. of  2 variables:
##  $ x: num  -0.0866 1.3225 0.6397 1.1748 0.1163 ...
##  $ y: chr  "A" "A" "A" "A" ...

Permutation

Permutation相當于是一種無放回的重抽樣方法，通常用于假設檢驗。

# single permutation 
set.seed(2222)
sample(df1$x, replace = FALSE)

##  [1]  9.31952342 -0.08658011  5.56155482  3.71174074  1.17478657  0.63970204
##  [7] -2.93084636  0.18759838  1.38404752  1.28394086  0.11629031  6.77816542
## [13]  1.11777194  0.11760093 -4.08500801  1.32252443  4.59743697 -2.67140783
## [19]  0.67750806  9.95418174

我們可以使用Permutation test檢驗A，B兩組的值是否有差異

# permutation test (n=1000)
set.seed(3333)
permt.ls <- list()
for (i in 1:1000) {
  permt.i <- sample(df1$x, replace = FALSE)
  # calculate the mean of difference from permutated samples
  diff.i <- abs(mean(permt.i[1:10]) - mean(permt.i[11:20]))
  permt.ls[[i]] <- c(permt.i, diff.i)
}
permt.df <- Reduce(rbind, permt.ls)
diff.raw <- abs(mean(df1$x[1:10]) - mean(df1$x[11:20]))
# Calculate the p-value
mean(permt.df[,21] >= diff.raw)

## [1] 0.079

每一次重抽樣后，我們都可以計算兩組樣本均值的差異。如果重抽樣樣本組間差異大于原始樣本組間差異的話，可以認為是一次錯誤事件，通過計算錯誤事件在總重抽樣次數(shù)中的占比就可以得到置換檢驗的p值。

在這里1000次重抽樣中，只有79次是錯誤事件，所以我們的p值為0.079

Bootstrap

Bootstrap是一種有放回的重抽樣方法，通常用于參數(shù)估計。

# single bootstrap
set.seed(4444)
sample(df1$x, replace = TRUE)

##  [1]  1.1177719  9.9541817 -2.9308464  0.1875984 -2.9308464 -4.0850080
##  [7] -4.0850080  4.5974370  0.1162903  0.6397020  6.7781654  4.5974370
## [13]  1.3225244  9.3195234  1.2839409 -4.0850080  0.6397020  1.1177719
## [19]  0.6775081  3.7117407

例如，我們用bootstrap估計總體的均值

set.seed(5555)
mean.raw <- mean(df1$x)
mean.i <- c()
# bootstrap (n=1000)
for (i in 1:1000) {
  boot.i <- sample(df1$x, replace = TRUE)
  mean.i[i] <- mean(boot.i)
}
mean.boost <- mean(mean.i)
mean.raw; mean.boost

## [1] 1.908527

## [1] 1.956184

此外，我們還可以計算bootstrap預測均值的標準誤（SE）

# calculate the standard error 
mean.se.boost <- sqrt(sum((mean.i - mean.boost)^2)/(1000-1))
mean.se.boost

## [1] 0.788522

Jackknife

Jakknife可以被認為是一種leave-one-out的重抽樣方法，對于大小為k的數(shù)據(jù)集，將產生k個大小為k-1的樣本.

jack.ls <- list()

for (i in 1:nrow(df1)) {
  jack.ls[[i]] <- df1[-i,]
}

length(jack.ls); dim(jack.ls[[1]])

## [1] 20

## [1] 19  2

Cross validation

交叉驗證將數(shù)據(jù)切分為測試集和驗證集，常在模型擬合中使用。例如k-fold cross validation將數(shù)據(jù)劃分為k組不重疊的數(shù)據(jù)集。

Figure: Illustration of 5-fold CV (https://yey.world/2020/08/31/MAST90083-05/).

# 1-fold cv
set.seed(6666)
training_size <- round(nrow(df1)*0.7)
training_idx <- sample(nrow(df1), size = training_size, replace = FALSE)
training_set <- df1[training_idx,]
val_set <- df1[-training_idx,]

nrow(training_set); nrow(val_set)

## [1] 14

## [1] 6

有時候，由于樣本量太小，無法滿足k-fold CV中不重疊分組的要求，有的樣本不可避免地被重復使用。

# 5-fold cv
set.seed(7777)
fold_idx <- list()
val_size <- nrow(df1) - training_size
all_idx <- 1:nrow(df1)
for (i in 1:5) {
  fold_idx_i <- unique(unlist(fold_idx))
  if (all(all_idx %in% fold_idx_i) | is.null(fold_idx_i)) {
    fold_idx[[i]] <- sample(all_idx, val_size, replace = FALSE)
  } else if (val_size < length(all_idx[-fold_idx_i])) {
    fold_idx[[i]] <- sample(all_idx[-fold_idx_i], val_size, replace = FALSE)
  } else if (val_size > length(all_idx[-fold_idx_i])) {
    fold_idx[[i]] <- c(all_idx[-fold_idx_i], sample(all_idx, val_size-length(all_idx[-fold_idx_i]), replace = FALSE))
  }
}
fold_data <- lapply(fold_idx, function(i) {
  list(training = df1[-i,], valdation = df1[i,])
})
fold_idx

## [[1]]
## [1] 16  3 19  5 15 14
## 
## [[2]]
## [1] 13  1 18  4  9 17
## 
## [[3]]
## [1]  7 20  6  8  2 12
## 
## [[4]]
## [1] 10 11 14 13  7 15
## 
## [[5]]
## [1]  3 10  2  5 15 12

以上就是對重抽樣方法的簡單介紹。

Ref:

https://stats.stackexchange.com/questions/104040/resampling-simulation-methods-monte-carlo-bootstrapping-jackknifing-cross

https://yey.world/2020/08/31/MAST90083-05/

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

R-抽樣

R-抽樣

Permutation

Bootstrap

Jackknife

Cross validation

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

R-抽樣

Permutation

Bootstrap

Jackknife

Cross validation

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av