一個(gè)數(shù)據(jù)處理的小示例

用書中的數(shù)據(jù)處理的例子，作為本章學(xué)習(xí)的實(shí)踐總結(jié)：

一組學(xué)生參加了數(shù)學(xué)、科學(xué)和英語考試。為了給所有學(xué)生確定一個(gè)單一的成績(jī)衡量指標(biāo)，需要將這些科目的成績(jī)組合起來。另外，你還想將前20%的學(xué)生評(píng)定為A，接下來20%的學(xué)生評(píng)定為B，依次類推。最后，按字母順序?qū)W(xué)生排序。

數(shù)據(jù)如下表所示：

學(xué)生姓名	數(shù)學(xué)	科學(xué)	英語
John Davis	502	95	25
Angela Williams	600	99	22
Bullwinkle Moose	412	80	18
David Jones	358	82	15
Janice Markhammer	495	75	20
Cheryl Cushing	512	85	28
Reuven Ytzrhak	410	80	15
Greg Knox	625	95	30
Joel England	573	89	27
Mary Rayburn	522	86	18

需要處理的問題可以拆分成3個(gè)：

各科成績(jī)標(biāo)準(zhǔn)化；
總成績(jī)按20%分段評(píng)級(jí)；
學(xué)生名稱依據(jù)字母順序排序；

步驟1

我們來依次解決問題，首先把表格數(shù)據(jù)轉(zhuǎn)換成R中的數(shù)據(jù)框。

options(digits=2)用于設(shè)定計(jì)算結(jié)果的有效位數(shù)。

Student <- c("John Davis", "Angela Williams", "Bullwinkle Moose",
             "David Jones", "Janice Markhammer", "Cheryl Cushing",
             "Reuven Ytzrhak", "Greg Knox", "Joel England",
             "Mary Rayburn")
Math <- c(502, 600, 412, 358, 495, 512, 410, 625, 573, 522)
Science <- c(95, 99, 80, 82, 75, 85, 80, 95, 89, 86)
English <- c(25, 22, 18, 15, 20, 28, 15, 30, 27, 18)
roster <- data.frame(Student, Math, Science, English,
                       stringsAsFactors=FALSE)
> roster
             Student Math Science English
1         John Davis  502      95      25
2    Angela Williams  600      99      22
3   Bullwinkle Moose  412      80      18
4        David Jones  358      82      15
5  Janice Markhammer  495      75      20
6     Cheryl Cushing  512      85      28
7     Reuven Ytzrhak  410      80      15
8          Greg Knox  625      95      30
9       Joel England  573      89      27
10      Mary Rayburn  522      86      18

步驟2

我們可以看到，三科成績(jī)存在數(shù)量級(jí)級(jí)別的差距，直接進(jìn)行求和運(yùn)算顯然不妥，我們需要對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化，使其可以進(jìn)行運(yùn)算和比較。

R提供了一個(gè)scale函數(shù)，將原始數(shù)據(jù)用單位標(biāo)準(zhǔn)差來表示，而不是以原始的尺度來表示。

> z <- scale(roster[,2:4])
> z
        Math Science English
 [1,]  0.013   1.078   0.587
 [2,]  1.143   1.591   0.037
 [3,] -1.026  -0.847  -0.697
 [4,] -1.649  -0.590  -1.247
 [5,] -0.068  -1.489  -0.330
 [6,]  0.128  -0.205   1.137
 [7,] -1.049  -0.847  -1.247
 [8,]  1.432   1.078   1.504
 [9,]  0.832   0.308   0.954
[10,]  0.243  -0.077  -0.697
attr(,"scaled:center")
   Math Science English 
    501      87      22 
attr(,"scaled:scale")
   Math Science English 
   86.7     7.8     5.5

步驟3

然后使用apply函數(shù)將mean函數(shù)分別運(yùn)用到各行數(shù)據(jù)，計(jì)算每一行的均值來獲得綜合得分，并使用cbind函數(shù)添加到原數(shù)據(jù)框中：

score <- apply(z, 1, mean)
roster <- cbind(roster, score)
roster
             Student Math Science English score score
1         John Davis  502      95      25  0.56  0.56
2    Angela Williams  600      99      22  0.92  0.92
3   Bullwinkle Moose  412      80      18 -0.86 -0.86
4        David Jones  358      82      15 -1.16 -1.16
5  Janice Markhammer  495      75      20 -0.63 -0.63
6     Cheryl Cushing  512      85      28  0.35  0.35
7     Reuven Ytzrhak  410      80      15 -1.05 -1.05
8          Greg Knox  625      95      30  1.34  1.34
9       Joel England  573      89      27  0.70  0.70
10      Mary Rayburn  522      86      18 -0.18 -0.18

步驟4

然后使用quantile函數(shù)按照20%進(jìn)行分段計(jì)算分位數(shù)：

y <- quantile(roster$score, c(0.8, 0.6, 0.4, 0.2))
> y
  80%   60%   40%   20% 
 0.74  0.44 -0.36 -0.89

步驟5

然后使用邏輯運(yùn)算符進(jìn)行數(shù)據(jù)的重編碼，將成績(jī)重編碼為等級(jí)字符,創(chuàng)建一個(gè)變量grade：

roster$grade[roster$score >= y[1]] <- 'A'
roster$grade[y[1] > roster$score & roster$score >= y[2] ] <- 'B'
roster$grade[y[2] > roster$score & roster$score >= y[3] ] <- 'C'
roster$grade[y[3] > roster$score & roster$score >= y[4] ] <- 'D'
roster$grade[roster$score < y[4]] <- 'E'
roster

             Student Math Science English score score grade
1         John Davis  502      95      25  0.56  0.56     B
2    Angela Williams  600      99      22  0.92  0.92     A
3   Bullwinkle Moose  412      80      18 -0.86 -0.86     D
4        David Jones  358      82      15 -1.16 -1.16     E
5  Janice Markhammer  495      75      20 -0.63 -0.63     D
6     Cheryl Cushing  512      85      28  0.35  0.35     C
7     Reuven Ytzrhak  410      80      15 -1.05 -1.05     E
8          Greg Knox  625      95      30  1.34  1.34     A
9       Joel England  573      89      27  0.70  0.70     B
10      Mary Rayburn  522      86      18 -0.18 -0.18     C

說明一下上面的語句，roster$grade直接在數(shù)據(jù)集roster中創(chuàng)建了一個(gè)新變量grade，[roster$score >= y[1]]是子集選取的語句，符合條件的子集被賦值。

步驟6

根據(jù)姓名的首字母排序問題，首先使用strsplit函數(shù)將姓名進(jìn)行拆分，分隔符為空格，返回對(duì)象為列表：

> name<- strsplit(roster$Student, split = " ")
> str(name)
List of 10
 $ : chr [1:2] "John" "Davis"
 $ : chr [1:2] "Angela" "Williams"
 $ : chr [1:2] "Bullwinkle" "Moose"
 $ : chr [1:2] "David" "Jones"
 $ : chr [1:2] "Janice" "Markhammer"
 $ : chr [1:2] "Cheryl" "Cushing"
 $ : chr [1:2] "Reuven" "Ytzrhak"
 $ : chr [1:2] "Greg" "Knox"
 $ : chr [1:2] "Joel" "England"
 $ : chr [1:2] "Mary" "Rayburn"

步驟7：使用函數(shù)sapply() 提取列表中各個(gè)成分作為Firstname和Lastname變量， "[" 是一個(gè)可以提取某個(gè)對(duì)象的一部分的函數(shù),1和2指定位置。

Firstname <- sapply(name, "[", 1) 
Lastname <- sapply(name, "[", 2) 
roster <- cbind(Firstname, Lastname, roster[,-1])
    Firstname   Lastname Math Science English score score.1 grade
1        John      Davis  502      95      25  0.56    0.56     B
2      Angela   Williams  600      99      22  0.92    0.92     A
3  Bullwinkle      Moose  412      80      18 -0.86   -0.86     D
4       David      Jones  358      82      15 -1.16   -1.16     E
5      Janice Markhammer  495      75      20 -0.63   -0.63     D
6      Cheryl    Cushing  512      85      28  0.35    0.35     C
7      Reuven    Ytzrhak  410      80      15 -1.05   -1.05     E
8        Greg       Knox  625      95      30  1.34    1.34     A
9        Joel    England  573      89      27  0.70    0.70     B
10       Mary    Rayburn  522      86      18 -0.18   -0.18     C

步驟7

最后，使用order函數(shù)進(jìn)行排序：

roster[order(Lastname,Firstname),]
    Firstname   Lastname Math Science English score score.1 grade
6      Cheryl    Cushing  512      85      28  0.35    0.35     C
1        John      Davis  502      95      25  0.56    0.56     B
9        Joel    England  573      89      27  0.70    0.70     B
4       David      Jones  358      82      15 -1.16   -1.16     E
8        Greg       Knox  625      95      30  1.34    1.34     A
5      Janice Markhammer  495      75      20 -0.63   -0.63     D
3  Bullwinkle      Moose  412      80      18 -0.86   -0.86     D
10       Mary    Rayburn  522      86      18 -0.18   -0.18     C
2      Angela   Williams  600      99      22  0.92    0.92     A
7      Reuven    Ytzrhak  410      80      15 -1.05   -1.05     E

好了，到此，這個(gè)數(shù)據(jù)處理的實(shí)踐例子就做完了。

到此為止，我覺得要入門R語言，看這本書到這里基本就可以了。R的基本語法、數(shù)據(jù)結(jié)構(gòu)、處理數(shù)據(jù)的基本方式都涵蓋了。后續(xù)在應(yīng)用中需要什么包就再去學(xué)習(xí)。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

《R語言實(shí)戰(zhàn)》學(xué)習(xí)筆記---Chapter5(11) 高級(jí)數(shù)據(jù)管理一個(gè)數(shù)據(jù)處理的小示例

《R語言實(shí)戰(zhàn)》學(xué)習(xí)筆記---Chapter5(11) 高級(jí)數(shù)據(jù)管理一個(gè)數(shù)據(jù)處理的小示例

一個(gè)數(shù)據(jù)處理的小示例

步驟1

步驟2

步驟3

步驟4

步驟5

步驟6

步驟7

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

《R語言實(shí)戰(zhàn)》學(xué)習(xí)筆記---Chapter5(11) 高級(jí)數(shù)據(jù)管理 一個(gè)數(shù)據(jù)處理的小示例

一個(gè)數(shù)據(jù)處理的小示例

步驟1

步驟2

步驟3

步驟4

步驟5

步驟6

步驟7

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

《R語言實(shí)戰(zhàn)》學(xué)習(xí)筆記---Chapter5(11) 高級(jí)數(shù)據(jù)管理一個(gè)數(shù)據(jù)處理的小示例