<總結(jié)向> awk使用手冊

awk 用于文本文件的分析與處理

0x00 使用方法

awk '{pattern + action}' [filenames]

其中pattern代表的是正則表達式,用于匹配我們需要截取的數(shù)據(jù),需要用斜杠括起來。

action是在找到數(shù)據(jù)時執(zhí)行的操作。

0x01 例子

awk工作流程是這樣的:讀入有\n換行符分割的一條記錄,然后將記錄按指定的域分隔符(默認空白符或制表符)劃分域,填充域,$0則表示所有域,$1表示第一個域,$n表示第n個域。

如下我們執(zhí)行last -n 5

last -n 5
root     pts/4        172.20.3.158     Mon Aug  1 11:20   still logged in   
root     pts/3        172.20.3.158     Mon Aug  1 10:58   still logged in   
root     pts/2        172.20.3.158     Mon Aug  1 10:57   still logged in   
root     pts/1        172.20.3.158     Mon Aug  1 10:57   still logged in   
root     pts/0        172.20.3.158     Mon Aug  1 10:57   still logged in   

wtmp begins Mon Apr 25 17:46:29 2016

再以默認分隔符去分割輸出 可得第一個域和第二個域

last -n 5 | awk '{print $1,$2}'
root pts/4
root pts/3
root pts/2
root pts/1
root pts/0

wtmp begins

接下來我們嘗試設置其域分隔符,通常以-F來設置域分隔符。再將打印的域以\t分隔打印輸出。

cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
at      /bin/bash
bin     /bin/bash
daemon  /bin/bash
ftp     /bin/bash
ftpsecure       /bin/false
games   /bin/bash
gdm     /bin/false
lp      /bin/bash
mail    /bin/false
man     /bin/bash
messagebus      /bin/false
news    /bin/bash
nobody  /bin/bash
nscd    /sbin/nologin
ntp     /bin/false
openslp /sbin/nologin
polkitd /sbin/nologin
postfix /bin/false
pulse   /sbin/nologin
root    /bin/zsh
rpc     /sbin/nologin
rtkit   /bin/false
scard   /usr/sbin/nologin
sshd    /bin/false
statd   /sbin/nologin
usbmux  /sbin/nologin
uucp    /bin/bash
vnc     /sbin/nologin
wwwrun  /bin/false
edward  /bin/zsh
ftp-edward      /bin/bash
lighthttpd      /bin/bash

再接著我們嘗試用BEGIN,PROC,END來指定程序的執(zhí)行流程。一般來說,程序會先執(zhí)行BEGIN部分代碼,再讀取文件以\n劃分被處理的一條條記錄,執(zhí)行PROC部分內(nèi)容,填充域,最后在執(zhí)行完P(guān)ROC部分之后再執(zhí)行END部分內(nèi)容。

現(xiàn)在我們將上面的程序改造一下,讓他先打印name shell,最后輸出一段話Action Finished

cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "Action Finished"}'
name,shell
at,/bin/bash
bin,/bin/bash
daemon,/bin/bash
ftp,/bin/bash
ftpsecure,/bin/false
games,/bin/bash
gdm,/bin/false
lp,/bin/bash
mail,/bin/false
man,/bin/bash
messagebus,/bin/false
news,/bin/bash
nobody,/bin/bash
nscd,/sbin/nologin
ntp,/bin/false
openslp,/sbin/nologin
polkitd,/sbin/nologin
postfix,/bin/false
pulse,/sbin/nologin
root,/bin/zsh
rpc,/sbin/nologin
rtkit,/bin/false
scard,/usr/sbin/nologin
sshd,/bin/false
statd,/sbin/nologin
usbmux,/sbin/nologin
uucp,/bin/bash
vnc,/sbin/nologin
wwwrun,/bin/false
edward,/bin/zsh
ftp-edward,/bin/bash
lighthttpd,/bin/bash
Action Finished

那么我們要獲取/etc/passwd里關(guān)于root賬戶的shell信息該怎么做呢?

awk -F ':' '/root/{print $7}' /etc/passwd
/bin/zsh

這里的意思就是先//之中的為pattern,即若當前行匹配root的正則表達式,則對該行進行處理。

0x02 內(nèi)置變量

awk存在許多內(nèi)置變量來設置環(huán)境信息,這些變量可以被改變。

ARGC               命令行參數(shù)個數(shù)
ARGV               命令行參數(shù)排列
ENVIRON            支持隊列中系統(tǒng)環(huán)境變量的使用
FILENAME           awk瀏覽的文件名
FNR                瀏覽文件的記錄數(shù)
FS                 設置輸入域分隔符,等價于命令行 -F選項
NF                 當前行中域的個數(shù)
NR                 已讀的行數(shù)
OFS                輸出域分隔符
ORS                輸出記錄分隔符
RS                 控制記錄分隔符

現(xiàn)在我們對其進行試用

awk -F ':' 'BEGIN {print "ARGC:" ARGC " ARGV:" ARGV[0]","ARGV[1] " Filename:" FILENAME " Total:" FNR "}{print "currLine:" NR " currColumns:" NF " content:" $0}' /etc/passwd
ARGC:2 ARGV:awk,/etc/passwd Filename: Total:0 Field Separator: Row Separator:

currLine:1 currColumns:7 content:at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
currLine:2 currColumns:7 content:bin:x:1:1:bin:/bin:/bin/bash
currLine:3 currColumns:7 content:daemon:x:2:2:Daemon:/sbin:/bin/bash
currLine:4 currColumns:7 content:ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
currLine:5 currColumns:7 content:ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
currLine:6 currColumns:7 content:games:x:12:100:Games account:/var/games:/bin/bash
currLine:7 currColumns:7 content:gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
currLine:8 currColumns:7 content:lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
currLine:9 currColumns:7 content:mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
currLine:10 currColumns:7 content:man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
currLine:11 currColumns:7 content:messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
currLine:12 currColumns:7 content:news:x:9:13:News system:/etc/news:/bin/bash
currLine:13 currColumns:7 content:nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
currLine:14 currColumns:7 content:nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
currLine:15 currColumns:7 content:ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
currLine:16 currColumns:7 content:openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
currLine:17 currColumns:7 content:polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
currLine:18 currColumns:7 content:postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
currLine:19 currColumns:7 content:pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
currLine:20 currColumns:7 content:root:x:0:0:root:/root:/bin/zsh
currLine:21 currColumns:7 content:rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
currLine:22 currColumns:7 content:rtkit:x:491:490:RealtimeKit:/proc:/bin/false
currLine:23 currColumns:7 content:scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
currLine:24 currColumns:7 content:sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
currLine:25 currColumns:7 content:statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
currLine:26 currColumns:7 content:usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
currLine:27 currColumns:7 content:uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
currLine:28 currColumns:7 content:vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
currLine:29 currColumns:7 content:wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
currLine:30 currColumns:7 content:edward:x:1000:100:Edward:/home/edward:/bin/zsh
currLine:31 currColumns:7 content:ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
currLine:32 currColumns:7 content:lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash

由此可見在未讀入目標文件時,文件名,域分隔符,記錄分隔符,以及總記錄數(shù)未知。于是我們修改為:

awk -F ':' 'BEGIN {
  print "ARGC:" ARGC
  print "ARGV:"
  for (i=0;i<ARGC;i++)
    print ARGV[i]
}{
  print "currLine:" NR " currColumns:" NF " content:" $0
} END {
  print "Filename:" FILENAME " Total:" FNR
}' /etc/passwd
ARGC:2
ARGV:
awk
/etc/passwd
currLine:1 currColumns:7 content:at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
currLine:2 currColumns:7 content:bin:x:1:1:bin:/bin:/bin/bash
currLine:3 currColumns:7 content:daemon:x:2:2:Daemon:/sbin:/bin/bash
currLine:4 currColumns:7 content:ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
currLine:5 currColumns:7 content:ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
currLine:6 currColumns:7 content:games:x:12:100:Games account:/var/games:/bin/bash
currLine:7 currColumns:7 content:gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
currLine:8 currColumns:7 content:lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
currLine:9 currColumns:7 content:mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
currLine:10 currColumns:7 content:man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
currLine:11 currColumns:7 content:messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
currLine:12 currColumns:7 content:news:x:9:13:News system:/etc/news:/bin/bash
currLine:13 currColumns:7 content:nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
currLine:14 currColumns:7 content:nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
currLine:15 currColumns:7 content:ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
currLine:16 currColumns:7 content:openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
currLine:17 currColumns:7 content:polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
currLine:18 currColumns:7 content:postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
currLine:19 currColumns:7 content:pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
currLine:20 currColumns:7 content:root:x:0:0:root:/root:/bin/zsh
currLine:21 currColumns:7 content:rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
currLine:22 currColumns:7 content:rtkit:x:491:490:RealtimeKit:/proc:/bin/false
currLine:23 currColumns:7 content:scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
currLine:24 currColumns:7 content:sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
currLine:25 currColumns:7 content:statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
currLine:26 currColumns:7 content:usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
currLine:27 currColumns:7 content:uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
currLine:28 currColumns:7 content:vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
currLine:29 currColumns:7 content:wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
currLine:30 currColumns:7 content:edward:x:1000:100:Edward:/home/edward:/bin/zsh
currLine:31 currColumns:7 content:ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
currLine:32 currColumns:7 content:lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash
Filename:/etc/passwd Total:32

同樣的我們可以通過printf函數(shù)對輸出進行格式化,使代碼更加易懂。

0x03 awk編程

變量與賦值

除了awk的內(nèi)置變量,awk還可以設置自定義變量。

如下我們統(tǒng)計/etc/passwd里用戶的個數(shù)。我們先初始化count為1,若不初始化,其初值為0。

awk 'BEGIN {
  count = 1;
  print count;
}
{
  count++;
  print $0;
}
END {
  print "user count is "count;
}
' /etc/passwd
1
at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
bin:x:1:1:bin:/bin:/bin/bash
daemon:x:2:2:Daemon:/sbin:/bin/bash
ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
games:x:12:100:Games account:/var/games:/bin/bash
gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
news:x:9:13:News system:/etc/news:/bin/bash
nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
root:x:0:0:root:/root:/bin/zsh
rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
rtkit:x:491:490:RealtimeKit:/proc:/bin/false
scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
edward:x:1000:100:Edward:/home/edward:/bin/zsh
ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash
user count is 33

接下來統(tǒng)計一個文件夾下文件占用的字節(jié)總數(shù)。

ls -l | awk 'BEGIN {
  size = 0;
  printf("[start]Initial Size is %s\n",size);
}{
  print $5;
  size = size + $5;
}
END {
  printf("[end]Final Size is %s\n",size);
}'
[start]Initial Size is 0

2713
0
472
244
58464
0
0
0
26
0
0
31729
0
49548
46
49548
47650
11
[end]Final Size is 240451

若要以M顯示。

ls -l | awk 'BEGIN {
  size = 0;
  printf("[start]Initial Size is %s\n",size);
}{
  print $5;
  size = size + $5;
}
END {
  printf("[end]Final Size is %sM\n",size/1024/1024);
}'
[start]Initial Size is 0

2713
0
472
244
58464
0
0
0
26
0
0
31729
0
49548
46
49548
47650
11
[end]Final Size is 0.229312M

條件語句

if (expression) {
    statement;
    statement;
    ... ...
}

if (expression) {
    statement;
} else {
    statement2;
}

if (expression) {
    statement1;
} else if (expression1) {
    statement2;
} else {
    statement3;
}

## 循環(huán)語句
循環(huán)語句也差不多的

## 數(shù)組
因為awk中數(shù)組的下標可以是數(shù)字和字母,數(shù)組的下標通常被稱為關(guān)鍵字(key)。值和關(guān)鍵字都存儲在內(nèi)部的一張針對key/value應用hash的表格里。由于hash不是順序存儲,因此在顯示數(shù)組內(nèi)容時會發(fā)現(xiàn),它們并不是按照你預料的順序顯示出來的。數(shù)組和變量一樣,都是在使用時自動創(chuàng)建的,awk也同樣會自動判斷其存儲的是數(shù)字還是字符串。一般而言,awk中的數(shù)組用來從記錄中**收集信息**,可以用于**計算總和**、**統(tǒng)計單詞**以及**跟蹤模板被匹配的次數(shù)**等等。
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關(guān)閱讀更多精彩內(nèi)容

  • linux資料總章2.1 1.0寫的不好抱歉 但是2.0已經(jīng)改了很多 但是錯誤還是無法避免 以后資料會慢慢更新 大...
    數(shù)據(jù)革命閱讀 13,194評論 2 33
  • 轉(zhuǎn)載 原文的排版和內(nèi)容都更加友好,并且詳細,我只是在這里貼出了一部分留作自己以后參考和學習,如希望更詳細了解AWK...
    XKirk閱讀 3,364評論 2 25
  • awk介紹awk變量printf命令:實現(xiàn)格式化輸出操作符awk patternawk actionawk數(shù)組aw...
    哈嘍別樣閱讀 1,732評論 0 4
  • awk:報告生成器,格式化文本輸出 內(nèi)容: awk介紹 awk基本用法 awk變量 awk格式化 awk操作符 a...
    BossHuang閱讀 1,552評論 0 9
  • awk: grep,sed,awk grep:文本過濾 sed:文本編輯 awk:文本格式化工具; 1 什么是aw...
    木林森閱讀 1,896評論 0 16

友情鏈接更多精彩內(nèi)容