我們用一個(gè)R內(nèi)置的測(cè)試數(shù)據(jù)airquality舉例什么是:
head(airquality)
? ozone solar.r wind temp month day
1? ? 41? ? ?190? 7.4? ?67? ? ?5? ?1
2? ? 36? ? ?118? 8.0? ?72? ? ?5? ?2
3? ? 12? ? ?149 12.6? ?74? ? ?5? ?3
4? ? 18? ? ?313 11.5? ?62? ? ?5? ?4
5? ? NA? ? ? NA 14.3? ?56? ? ?5? ?5
6? ? 28? ? ? NA 14.9? ?66? ? ?5? ?6
str(airquality)
'data.frame': 153 obs. of 6 variables:
$ ozone? : int? 41 36 12 18 NA 28 23 19 8 NA ...
$ solar.r: int? 190 118 149 313 NA NA 299 99 19 194 ...
$ wind? : num? 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ temp? : int? 67 72 74 62 56 66 65 59 61 69 ...
$ month? : int? 5 5 5 5 5 5 5 5 5 5 ...
$ day? ? : int? 1 2 3 4 5 6 7 8 9 10 ...
長(zhǎng)數(shù)據(jù):
"ozone" "solar.r" "wind" "temp" "month" "day"都是airquality的變量variable 名稱(chēng),value值就是對(duì)應(yīng)每個(gè)檢測(cè)的值,這樣的數(shù)據(jù)非常適合數(shù)據(jù)可視化。
head(melt(airquality), n = 10)
No id variables; using all as measure variables
? ?variable value
1? ? ?ozone? ? 41
2? ? ?ozone? ? 36
3? ? ?ozone? ? 12
4? ? ?ozone? ? 18
5? ? ?ozone? ? NA
6? ? ?ozone? ? 28
7? ? ?ozone? ? 23
8? ? ?ozone? ? 19
9? ? ?ozone? ? ?8
10? ? ozone? ? NA
寬數(shù)據(jù):
寬數(shù)據(jù)通常是變量為列,檢測(cè)為行所組成的數(shù)據(jù)框Data frame
head(airquality, n =10)
? ?ozone solar.r wind temp month day
1? ? ?41? ? ?190? 7.4? ?67? ? ?5? ?1
2? ? ?36? ? ?118? 8.0? ?72? ? ?5? ?2
3? ? ?12? ? ?149 12.6? ?74? ? ?5? ?3
4? ? ?18? ? ?313 11.5? ?62? ? ?5? ?4
5? ? ?NA? ? ? NA 14.3? ?56? ? ?5? ?5
6? ? ?28? ? ? NA 14.9? ?66? ? ?5? ?6
7? ? ?23? ? ?299? 8.6? ?65? ? ?5? ?7
8? ? ?19? ? ? 99 13.8? ?59? ? ?5? ?8
9? ? ? 8? ? ? 19 20.1? ?61? ? ?5? ?9
10? ? NA? ? ?194? 8.6? ?69? ? ?5? 10
# 1.工作目錄
setwd("reshape2")
# 2.安裝和導(dǎo)入
# install.packages("reshape2")
library(reshape2)
# 3.功能測(cè)試
help(package="reshape2")
### 3.1 acast(),Cast functions Cast a molten data frame into an array or data frame.
str(acast)
# function (data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL,
? ? ? ? ? # fill = NULL, drop = TRUE, value.var = guess_value(data))
# Cast functions Cast a molten data frame into an array or data frame.
names(airquality) <- tolower(names(airquality))
head(airquality)
# ozone solar.r wind temp month day
# 1? ? 41? ? 190? 7.4? 67? ? 5? 1
# 2? ? 36? ? 118? 8.0? 72? ? 5? 2
# 3? ? 12? ? 149 12.6? 74? ? 5? 3
# 4? ? 18? ? 313 11.5? 62? ? 5? 4
# 5? ? NA? ? ? NA 14.3? 56? ? 5? 5
# 6? ? 28? ? ? NA 14.9? 66? ? 5? 6
head(acast(aqm, day ~ month ~ variable))
, , ozone
? ?5? 6? ?7? 8? 9
1 41 NA 135 39 96
2 36 NA? 49? 9 78
3 12 NA? 32 16 73
4 18 NA? NA 78 91
5 NA NA? 64 35 47
6 28 NA? 40 66 32
, , solar.r
? ? 5? ?6? ?7? 8? ?9
1 190 286 269 83 167
2 118 287 248 24 197
3 149 242 236 77 183
4 313 186 101 NA 189
5? NA 220 175 NA? 95
6? NA 264 314 NA? 92
, , wind
? ? ?5? ? 6? ? 7? ? 8? ? 9
1? 7.4? 8.6? 4.1? 6.9? 6.9
2? 8.0? 9.7? 9.2 13.8? 5.1
3 12.6 16.1? 9.2? 7.4? 2.8
4 11.5? 9.2 10.9? 6.9? 4.6
5 14.3? 8.6? 4.6? 7.4? 7.4
6 14.9 14.3 10.9? 4.6 15.5
, , temp
? ?5? 6? 7? 8? 9
1 67 78 84 81 91
2 72 74 85 81 92
3 74 67 81 82 93
4 62 84 84 86 93
5 56 85 83 85 87
6 66 79 83 87 84
acast(aqm, month ~ variable, mean)
# ozone? solar.r? ? ? wind? ? temp
# 5 23.61538 181.2963 11.622581 65.54839
# 6 29.44444 190.1667 10.266667 79.10000
# 7 59.11538 216.4839? 8.941935 83.90323
# 8 59.96154 171.8571? 8.793548 83.96774
# 9 31.44828 167.4333 10.180000 76.90000
acast(aqm, month ~ variable, mean, margins = TRUE)
# ozone? solar.r? ? ? wind? ? temp? ? (all)
# 5? ? 23.61538 181.2963 11.622581 65.54839 68.70696
# 6? ? 29.44444 190.1667 10.266667 79.10000 87.38384
# 7? ? 59.11538 216.4839? 8.941935 83.90323 93.49748
# 8? ? 59.96154 171.8571? 8.793548 83.96774 79.71207
# 9? ? 31.44828 167.4333 10.180000 76.90000 71.82689
# (all) 42.12931 185.9315? 9.957516 77.88235 80.05722
dcast(aqm, month ~ variable, mean, margins = c("month", "variable"))
# month? ? ozone? solar.r? ? ? wind? ? temp? ? (all)
# 1? ? 5 23.61538 181.2963 11.622581 65.54839 68.70696
# 2? ? 6 29.44444 190.1667 10.266667 79.10000 87.38384
# 3? ? 7 59.11538 216.4839? 8.941935 83.90323 93.49748
# 4? ? 8 59.96154 171.8571? 8.793548 83.96774 79.71207
# 5? ? 9 31.44828 167.4333 10.180000 76.90000 71.82689
# 6 (all) 42.12931 185.9315? 9.957516 77.88235 80.05722
### 3.2? melt( ),寬數(shù)據(jù)轉(zhuǎn)化為長(zhǎng)數(shù)據(jù),Convert an object into a molten data frame.
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
head(aqm)
# month day variable value
# 1? ? 5? 1? ? ozone? ? 41
# 2? ? 5? 2? ? ozone? ? 36
# 3? ? 5? 3? ? ozone? ? 12
# 4? ? 5? 4? ? ozone? ? 18
# 6? ? 5? 6? ? ozone? ? 28
# 7? ? 5? 7? ? ozone? ? 23
### 3.3 colsplit()
?colsplit
# Split a vector into multiple columns
x <- c("a_1_T", "a_2_F", "b_2_T", "c_3_F")
vars <- colsplit(x, "_", c("trt", "time", "Boolean_value"))
vars
# trt time Boolean_value
# 1? a? ? 1? ? ? ? ? TRUE
# 2? a? ? 2? ? ? ? FALSE
# 3? b? ? 2? ? ? ? ? TRUE
# 4? c? ? 3? ? ? ? FALSE
str(vars)
# 'data.frame': 4 obs. of? 3 variables:
#? $ trt? ? ? ? ? : chr? "a" "a" "b" "c"
# $ time? ? ? ? : int? 1 2 2 3
# $ Boolean_value: logi? TRUE FALSE TRUE FALSE
### 3.4 recast(),Recast: melt and cast in a single step
### Recast: melt and cast in a single step
?recast
recast(french_fries, time ~ variable, id.var = 1:4)
# Aggregation function missing: defaulting to length
# time potato buttery grassy rancid painty
# 1? ? 1? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 2? ? 2? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 3? ? 3? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 4? ? 4? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 5? ? 5? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 6? ? 6? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 7? ? 7? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 8? ? 8? ? 72? ? ? 72? ? 72? ? 72? ? 72
# 9? ? 9? ? 60? ? ? 60? ? 60? ? 60? ? 60
# 10? 10? ? 60? ? ? 60? ? 60? ? 60? ? 60
### 3.5 reshape2: built-in data
str(tips)
# 'data.frame': 244 obs. of? 7 variables:
#? $ total_bill: num? 17 10.3 21 23.7 24.6 ...
# $ tip? ? ? : num? 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12 1.96 3.23 ...
# $ sex? ? ? : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 2 2 2 ...
# $ smoker? ? : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
# $ day? ? ? : Factor w/ 4 levels "Fri","Sat","Sun",..: 3 3 3 3 3 3 3 3 3 3 ...
# $ time? ? ? : Factor w/ 2 levels "Dinner","Lunch": 1 1 1 1 1 1 1 1 1 1 ...
# $ size? ? ? : int? 2 3 3 2 4 4 2 4 2 2 ...
# In all he recorded 244 tips. The data was reported in a collection of case studies for business statistics (Bryant & Smith 1995).
str(smiths)
# 'data.frame': 2 obs. of? 5 variables:
#? $ subject: Factor w/ 2 levels "John Smith","Mary Smith": 1 2
# $ time? : int? 1 1
# $ age? ? : num? 33 NA
# $ weight : num? 90 NA
# $ height : num? 1.87 1.54
# A small demo dataset describing John and Mary Smith. Used in the introductory vignette.
str(french_fries)
# 'data.frame': 696 obs. of? 9 variables:
#? $ time? ? : Factor w/ 10 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ treatment: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
# $ subject? : Factor w/ 12 levels "3","10","15",..: 1 1 2 2 3 3 4 4 5 5 ...
# $ rep? ? ? : num? 1 2 1 2 1 2 1 2 1 2 ...
# $ potato? : num? 2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...
# $ buttery? : num? 0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...
# $ grassy? : num? 0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...
# $ rancid? : num? 0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...
# $ painty? : num? 5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...
# This data was collected from a sensory experiment conducted at Iowa State University in 2004. The investigators were interested in the effect of using three different fryer oils had on the taste of the fries.
### 3.6 查看reshape2的描述信息
help(package="reshape2")
Package: reshape2
Title: Flexibly Reshape Data: A Reboot of the Reshape Package
Version: 1.4.4
Author: Hadley Wickham <h.wickham@gmail.com>
Maintainer: Hadley Wickham <h.wickham@gmail.com>
Description: Flexibly restructure and aggregate data using just two
? ? functions: melt and 'dcast' (or 'acast').
License: MIT + file LICENSE
URL: https://github.com/hadley/reshape
BugReports: https://github.com/hadley/reshape/issues
Depends: R (>= 3.1)
Imports: plyr (>= 1.8.1), Rcpp, stringr
Suggests: covr, lattice, testthat (>= 0.8.0)
LinkingTo: Rcpp
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.0
NeedsCompilation: yes
Packaged: 2020-04-09 12:27:19 UTC; hadley
Repository: CRAN
Date/Publication: 2020-04-09 13:50:02 UTC
Built: R 4.0.0; x86_64-w64-mingw32; 2020-05-02 21:38:15 UTC; windows
Archs: i386, x64
# 4.收尾
sessionInfo()
# R version 4.0.3 (2020-10-10)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18363)
#
# Matrix products: default
#
# locale:
#? [1] LC_COLLATE=Chinese (Simplified)_China.936
# [2] LC_CTYPE=Chinese (Simplified)_China.936
# [3] LC_MONETARY=Chinese (Simplified)_China.936
# [4] LC_NUMERIC=C
# [5] LC_TIME=Chinese (Simplified)_China.936
#
# attached base packages:
#? [1] stats? ? graphics? grDevices utils? ? datasets? methods? base
#
# other attached packages:
#? [1] reshape2_1.4.4
#
# loaded via a namespace (and not attached):
#? [1] compiler_4.0.3 magrittr_2.0.1 plyr_1.8.6? ? tools_4.0.3? ? yaml_2.2.1
# [6] Rcpp_1.0.6? ? tinytex_0.29? stringi_1.5.3? stringr_1.4.0? xfun_0.21