R語言基礎(chǔ)入門(4) mutate函數(shù)創(chuàng)建新列

今天繼續(xù)介紹dplyr包中的重要函數(shù)mutate,其基本功能為創(chuàng)建新列;mutate中的選項幾乎是無窮無盡的,可以通過各種函數(shù)之間的組合來對數(shù)據(jù)集做任意的處理,下面通過具體的案例來進行演示

這次我們使用R內(nèi)置的數(shù)據(jù)集msleep,其中包括哺乳動物的睡眠時間。讓我們首先加載包并查看數(shù)據(jù):

library(tidyverse)
msleep
   name    genus  vore  order conservation sleep_total sleep_rem sleep_cycle
   <chr>   <chr>  <chr> <chr> <chr>              <dbl>     <dbl>       <dbl>
 1 Cheetah Acino~ carni Carn~ lc                  12.1      NA        NA    
 2 Owl mo~ Aotus  omni  Prim~ NA                  17         1.8      NA    
 3 Mounta~ Aplod~ herbi Rode~ nt                  14.4       2.4      NA    
 4 Greate~ Blari~ omni  Sori~ lc                  14.9       2.3       0.133

mutate 基礎(chǔ)操作

最簡單的的操作就是根據(jù)其他列中的值進行計算。在示例代碼中,我們將睡眠數(shù)據(jù)從以小時為單位更改為分鐘為單位

msleep %>%
  select(name,sleep_total) %>%
  mutate(sleep_total_min = sleep_total * 60)
   name                       sleep_total sleep_total_min
   <chr>                            <dbl>           <dbl>
 1 Cheetah                           12.1             726
 2 Owl monkey                        17              1020
 3 Mountain beaver                   14.4             864

下列代碼創(chuàng)建了兩列新列:一列顯示了睡眠時間與平均睡眠時間的差異,另一列顯示了與睡眠時間最少的動物之間的差異;round( )對數(shù)據(jù)進行四舍五入操作

msleep %>%
  select(name, sleep_total) %>%
  mutate(AVG = sleep_total - round(mean(sleep_total), 1),
         MIN = sleep_total - min(sleep_total))
# A tibble: 83 x 4
   name                       sleep_total    AVG   MIN
   <chr>                            <dbl>  <dbl> <dbl>
 1 Cheetah                           12.1  1.7    10.2
 2 Owl monkey                        17    6.6    15.1
 3 Mountain beaver                   14.4  4      12.5

選擇特定列按行求均值,rowwise( )說明按行進行操作

msleep %>%
  select(name, contains("sleep")) %>%
  rowwise() %>%
  mutate(avg = mean(c(sleep_rem,sleep_cycle)))
  name                sleep_total sleep_rem sleep_cycle    avg
   <chr>                     <dbl>     <dbl>       <dbl>  <dbl>
 1 Cheetah                    12.1      NA        NA     NA    
 2 Owl monkey                 17         1.8      NA     NA    
 3 Mountain beaver            14.4       2.4      NA     NA    
 4 Greater short-tail~        14.9       2.3       0.133  1.22 

通過ifelse判斷語句對數(shù)據(jù)進行操作,如果brainwt > 4返回NA,不滿足此條件返回原值

msleep %>%
  select(name, brainwt) %>%
  mutate(brainwt2 = ifelse(brainwt > 4, NA, brainwt)) %>%
  arrange(desc(brainwt))
   name             brainwt brainwt2
   <chr>              <dbl>    <dbl>
 1 African elephant   5.71    NA    
 2 Asian elephant     4.60    NA    
 3 Human              1.32     1.32 
 4 Horse              0.655    0.655

也可以結(jié)合使用stringr的功能或正則表達式來對字符串列進行操作;
示例代碼將返回動物名稱的最后一個單詞,并使其小寫

msleep %>%
  select(name) %>%
  mutate(name_last_word = tolower(str_extract(name, pattern = "\\w+$")))
   name                       name_last_word
   <chr>                      <chr>         
 1 Cheetah                    cheetah       
 2 Owl monkey                 monkey        
 3 Mountain beaver            beaver        

對多列同時進行操作

  • mutate_all() 將對所有列進行操作
  • mutate_if()首先需要一個返回布爾值,如果是T,則將在這些變量上執(zhí)行mutate指令
  • mutate_at()要求在vars() 參數(shù)內(nèi)指定要進行改變的列

將所有數(shù)據(jù)轉(zhuǎn)換為小寫:

msleep %>% mutate_all(tolower)
   name    genus vore  order conservation sleep_total sleep_rem
   <chr>   <chr> <chr> <chr> <chr>        <chr>       <chr>    
 1 cheetah acin~ carni carn~ lc           12.1        NA       
 2 owl mo~ aotus omni  prim~ NA           17          1.8      
 3 mounta~ aplo~ herbi rode~ nt           14.4        2.4      

所有列添加" /n "

msleep %>% mutate_all(~paste(., "  /n  "))

" /n "全部替換為空

msleep_ohno <- msleep %>% mutate_all(~paste(., "  /n  ")) 

msleep_ohno %>%
  mutate_all(~str_replace_all(., "/n", "")) %>%
  mutate_all(str_trim)

mutate_if()對數(shù)據(jù)進行判斷

如果數(shù)據(jù)類型是數(shù)值,對其進行四舍五入操作

msleep %>%
  select(name, sleep_total:bodywt) %>%
  mutate_if(is.numeric, round)
   name                       sleep_total sleep_rem sleep_cycle awake brainwt bodywt
   <chr>                            <dbl>     <dbl>       <dbl> <dbl>   <dbl>  <dbl>
 1 Cheetah                             12        NA          NA    12      NA     50
 2 Owl monkey                          17         2          NA     7       0      0
 3 Mountain beaver                     14         2          NA    10      NA      1

mutate_at( )對特定列進行操作

對列名含有sleep的進行操作

msleep %>%
  select(name, sleep_total:awake) %>%
  mutate_at(vars(contains("sleep")), ~(.*60))
   name                       sleep_total sleep_rem sleep_cycle awake
   <chr>                            <dbl>     <dbl>       <dbl> <dbl>
 1 Cheetah                            726        NA       NA     11.9
 2 Owl monkey                        1020       108       NA      7  
 3 Mountain beaver                    864       144       NA      9.6

更改列名

msleep %>%
  select(name, sleep_total:awake) %>%
  mutate_at(vars(contains("sleep")), ~(.*60)) %>%
  rename_at(vars(contains("sleep")), ~paste0(.,"_min"))
   name                       sleep_total_min sleep_rem_min sleep_cycle_min awake
   <chr>                                <dbl>         <dbl>           <dbl> <dbl>
 1 Cheetah                                726            NA           NA     11.9
 2 Owl monkey                            1020           108           NA      7  
 3 Mountain beaver                        864           144           NA      9.6

保留原始數(shù)據(jù)

msleep %>%
  select(name, sleep_total:awake) %>%
  mutate_at(vars(contains("sleep")), funs(min = .*60))
   name           sleep_total sleep_rem sleep_cycle awake sleep_total_min sleep_rem_min sleep_cycle_min
   <chr>                <dbl>     <dbl>       <dbl> <dbl>           <dbl>         <dbl>           <dbl>
 1 Cheetah               12.1      NA        NA      11.9             726            NA           NA   
 2 Owl monkey            17         1.8      NA       7              1020           108           NA   

ifelse創(chuàng)建2個級別的離散列

msleep %>%
  select(name, sleep_total) %>%
  mutate(sleep_time = ifelse(sleep_total > 10, "long", "short"))
   name                       sleep_total sleep_time
   <chr>                            <dbl> <chr>     
 1 Cheetah                           12.1 long      
 2 Owl monkey                        17   long      
 3 Mountain beaver                   14.4 long      

case_when創(chuàng)建多級離散列

此函數(shù)在后續(xù)數(shù)據(jù)清洗中有大有,需要多多練習(xí)

msleep %>%
  select(name, sleep_total) %>%
  mutate(sleep_total_discr = case_when(
    sleep_total > 13 ~ "very long",
    sleep_total > 10 ~ "long",
    sleep_total > 7 ~ "limited",
    TRUE ~ "short"))
   name                       sleep_total sleep_total_discr
   <chr>                            <dbl> <chr>            
 1 Cheetah                           12.1 long             
 2 Owl monkey                        17   very long        
 3 Mountain beaver                   14.4 very long        
 4 Greater short-tailed shrew        14.9 very long        

將數(shù)據(jù)轉(zhuǎn)化為NA

msleep %>%
  select(name:order) %>%
  na_if("omni")
   name                       genus       vore  order       
   <chr>                      <chr>       <chr> <chr>       
 1 Cheetah                    Acinonyx    carni Carnivora   
 2 Owl monkey                 Aotus       NA    Primates    
 3 Mountain beaver            Aplodontia  herbi Rodentia    
 4 Greater short-tailed shrew Blarina     NA    Soricomorpha
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容