awk 用于文本文件的分析與處理
0x00 使用方法
awk '{pattern + action}' [filenames]
其中pattern代表的是正則表達式,用于匹配我們需要截取的數(shù)據(jù),需要用斜杠括起來。
action是在找到數(shù)據(jù)時執(zhí)行的操作。
0x01 例子
awk工作流程是這樣的:讀入有\n換行符分割的一條記錄,然后將記錄按指定的域分隔符(默認空白符或制表符)劃分域,填充域,$0則表示所有域,$1表示第一個域,$n表示第n個域。
如下我們執(zhí)行last -n 5
last -n 5
root pts/4 172.20.3.158 Mon Aug 1 11:20 still logged in
root pts/3 172.20.3.158 Mon Aug 1 10:58 still logged in
root pts/2 172.20.3.158 Mon Aug 1 10:57 still logged in
root pts/1 172.20.3.158 Mon Aug 1 10:57 still logged in
root pts/0 172.20.3.158 Mon Aug 1 10:57 still logged in
wtmp begins Mon Apr 25 17:46:29 2016
再以默認分隔符去分割輸出 可得第一個域和第二個域
last -n 5 | awk '{print $1,$2}'
root pts/4
root pts/3
root pts/2
root pts/1
root pts/0
wtmp begins
接下來我們嘗試設置其域分隔符,通常以-F來設置域分隔符。再將打印的域以\t分隔打印輸出。
cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
at /bin/bash
bin /bin/bash
daemon /bin/bash
ftp /bin/bash
ftpsecure /bin/false
games /bin/bash
gdm /bin/false
lp /bin/bash
mail /bin/false
man /bin/bash
messagebus /bin/false
news /bin/bash
nobody /bin/bash
nscd /sbin/nologin
ntp /bin/false
openslp /sbin/nologin
polkitd /sbin/nologin
postfix /bin/false
pulse /sbin/nologin
root /bin/zsh
rpc /sbin/nologin
rtkit /bin/false
scard /usr/sbin/nologin
sshd /bin/false
statd /sbin/nologin
usbmux /sbin/nologin
uucp /bin/bash
vnc /sbin/nologin
wwwrun /bin/false
edward /bin/zsh
ftp-edward /bin/bash
lighthttpd /bin/bash
再接著我們嘗試用BEGIN,PROC,END來指定程序的執(zhí)行流程。一般來說,程序會先執(zhí)行BEGIN部分代碼,再讀取文件以\n劃分被處理的一條條記錄,執(zhí)行PROC部分內(nèi)容,填充域,最后在執(zhí)行完P(guān)ROC部分之后再執(zhí)行END部分內(nèi)容。
現(xiàn)在我們將上面的程序改造一下,讓他先打印name shell,最后輸出一段話Action Finished
cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "Action Finished"}'
name,shell
at,/bin/bash
bin,/bin/bash
daemon,/bin/bash
ftp,/bin/bash
ftpsecure,/bin/false
games,/bin/bash
gdm,/bin/false
lp,/bin/bash
mail,/bin/false
man,/bin/bash
messagebus,/bin/false
news,/bin/bash
nobody,/bin/bash
nscd,/sbin/nologin
ntp,/bin/false
openslp,/sbin/nologin
polkitd,/sbin/nologin
postfix,/bin/false
pulse,/sbin/nologin
root,/bin/zsh
rpc,/sbin/nologin
rtkit,/bin/false
scard,/usr/sbin/nologin
sshd,/bin/false
statd,/sbin/nologin
usbmux,/sbin/nologin
uucp,/bin/bash
vnc,/sbin/nologin
wwwrun,/bin/false
edward,/bin/zsh
ftp-edward,/bin/bash
lighthttpd,/bin/bash
Action Finished
那么我們要獲取/etc/passwd里關(guān)于root賬戶的shell信息該怎么做呢?
awk -F ':' '/root/{print $7}' /etc/passwd
/bin/zsh
這里的意思就是先//之中的為pattern,即若當前行匹配root的正則表達式,則對該行進行處理。
0x02 內(nèi)置變量
awk存在許多內(nèi)置變量來設置環(huán)境信息,這些變量可以被改變。
ARGC 命令行參數(shù)個數(shù)
ARGV 命令行參數(shù)排列
ENVIRON 支持隊列中系統(tǒng)環(huán)境變量的使用
FILENAME awk瀏覽的文件名
FNR 瀏覽文件的記錄數(shù)
FS 設置輸入域分隔符,等價于命令行 -F選項
NF 當前行中域的個數(shù)
NR 已讀的行數(shù)
OFS 輸出域分隔符
ORS 輸出記錄分隔符
RS 控制記錄分隔符
現(xiàn)在我們對其進行試用
awk -F ':' 'BEGIN {print "ARGC:" ARGC " ARGV:" ARGV[0]","ARGV[1] " Filename:" FILENAME " Total:" FNR "}{print "currLine:" NR " currColumns:" NF " content:" $0}' /etc/passwd
ARGC:2 ARGV:awk,/etc/passwd Filename: Total:0 Field Separator: Row Separator:
currLine:1 currColumns:7 content:at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
currLine:2 currColumns:7 content:bin:x:1:1:bin:/bin:/bin/bash
currLine:3 currColumns:7 content:daemon:x:2:2:Daemon:/sbin:/bin/bash
currLine:4 currColumns:7 content:ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
currLine:5 currColumns:7 content:ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
currLine:6 currColumns:7 content:games:x:12:100:Games account:/var/games:/bin/bash
currLine:7 currColumns:7 content:gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
currLine:8 currColumns:7 content:lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
currLine:9 currColumns:7 content:mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
currLine:10 currColumns:7 content:man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
currLine:11 currColumns:7 content:messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
currLine:12 currColumns:7 content:news:x:9:13:News system:/etc/news:/bin/bash
currLine:13 currColumns:7 content:nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
currLine:14 currColumns:7 content:nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
currLine:15 currColumns:7 content:ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
currLine:16 currColumns:7 content:openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
currLine:17 currColumns:7 content:polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
currLine:18 currColumns:7 content:postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
currLine:19 currColumns:7 content:pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
currLine:20 currColumns:7 content:root:x:0:0:root:/root:/bin/zsh
currLine:21 currColumns:7 content:rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
currLine:22 currColumns:7 content:rtkit:x:491:490:RealtimeKit:/proc:/bin/false
currLine:23 currColumns:7 content:scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
currLine:24 currColumns:7 content:sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
currLine:25 currColumns:7 content:statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
currLine:26 currColumns:7 content:usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
currLine:27 currColumns:7 content:uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
currLine:28 currColumns:7 content:vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
currLine:29 currColumns:7 content:wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
currLine:30 currColumns:7 content:edward:x:1000:100:Edward:/home/edward:/bin/zsh
currLine:31 currColumns:7 content:ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
currLine:32 currColumns:7 content:lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash
由此可見在未讀入目標文件時,文件名,域分隔符,記錄分隔符,以及總記錄數(shù)未知。于是我們修改為:
awk -F ':' 'BEGIN {
print "ARGC:" ARGC
print "ARGV:"
for (i=0;i<ARGC;i++)
print ARGV[i]
}{
print "currLine:" NR " currColumns:" NF " content:" $0
} END {
print "Filename:" FILENAME " Total:" FNR
}' /etc/passwd
ARGC:2
ARGV:
awk
/etc/passwd
currLine:1 currColumns:7 content:at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
currLine:2 currColumns:7 content:bin:x:1:1:bin:/bin:/bin/bash
currLine:3 currColumns:7 content:daemon:x:2:2:Daemon:/sbin:/bin/bash
currLine:4 currColumns:7 content:ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
currLine:5 currColumns:7 content:ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
currLine:6 currColumns:7 content:games:x:12:100:Games account:/var/games:/bin/bash
currLine:7 currColumns:7 content:gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
currLine:8 currColumns:7 content:lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
currLine:9 currColumns:7 content:mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
currLine:10 currColumns:7 content:man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
currLine:11 currColumns:7 content:messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
currLine:12 currColumns:7 content:news:x:9:13:News system:/etc/news:/bin/bash
currLine:13 currColumns:7 content:nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
currLine:14 currColumns:7 content:nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
currLine:15 currColumns:7 content:ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
currLine:16 currColumns:7 content:openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
currLine:17 currColumns:7 content:polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
currLine:18 currColumns:7 content:postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
currLine:19 currColumns:7 content:pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
currLine:20 currColumns:7 content:root:x:0:0:root:/root:/bin/zsh
currLine:21 currColumns:7 content:rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
currLine:22 currColumns:7 content:rtkit:x:491:490:RealtimeKit:/proc:/bin/false
currLine:23 currColumns:7 content:scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
currLine:24 currColumns:7 content:sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
currLine:25 currColumns:7 content:statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
currLine:26 currColumns:7 content:usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
currLine:27 currColumns:7 content:uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
currLine:28 currColumns:7 content:vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
currLine:29 currColumns:7 content:wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
currLine:30 currColumns:7 content:edward:x:1000:100:Edward:/home/edward:/bin/zsh
currLine:31 currColumns:7 content:ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
currLine:32 currColumns:7 content:lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash
Filename:/etc/passwd Total:32
同樣的我們可以通過printf函數(shù)對輸出進行格式化,使代碼更加易懂。
0x03 awk編程
變量與賦值
除了awk的內(nèi)置變量,awk還可以設置自定義變量。
如下我們統(tǒng)計/etc/passwd里用戶的個數(shù)。我們先初始化count為1,若不初始化,其初值為0。
awk 'BEGIN {
count = 1;
print count;
}
{
count++;
print $0;
}
END {
print "user count is "count;
}
' /etc/passwd
1
at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
bin:x:1:1:bin:/bin:/bin/bash
daemon:x:2:2:Daemon:/sbin:/bin/bash
ftp:x:40:49:FTP account:/srv/ftp:/bin/bash
ftpsecure:x:488:65534:Secure FTP User:/var/lib/empty:/bin/false
games:x:12:100:Games account:/var/games:/bin/bash
gdm:x:486:485:Gnome Display Manager daemon:/var/lib/gdm:/bin/false
lp:x:4:7:Printing daemon:/var/spool/lpd:/bin/bash
mail:x:8:12:Mailer daemon:/var/spool/clientmqueue:/bin/false
man:x:13:62:Manual pages viewer:/var/cache/man:/bin/bash
messagebus:x:499:499:User for D-Bus:/var/run/dbus:/bin/false
news:x:9:13:News system:/etc/news:/bin/bash
nobody:x:65534:65533:nobody:/var/lib/nobody:/bin/bash
nscd:x:496:495:User for nscd:/run/nscd:/sbin/nologin
ntp:x:74:492:NTP daemon:/var/lib/ntp:/bin/false
openslp:x:494:2:openslp daemon:/var/lib/empty:/sbin/nologin
polkitd:x:497:496:User for polkitd:/var/lib/polkit:/sbin/nologin
postfix:x:51:51:Postfix Daemon:/var/spool/postfix:/bin/false
pulse:x:490:489:PulseAudio daemon:/var/lib/pulseaudio:/sbin/nologin
root:x:0:0:root:/root:/bin/zsh
rpc:x:495:65534:user for rpcbind:/var/lib/empty:/sbin/nologin
rtkit:x:491:490:RealtimeKit:/proc:/bin/false
scard:x:487:487:Smart Card Reader:/var/run/pcscd:/usr/sbin/nologin
sshd:x:498:498:SSH daemon:/var/lib/sshd:/bin/false
statd:x:489:65534:NFS statd daemon:/var/lib/nfs:/sbin/nologin
usbmux:x:493:65534:usbmuxd daemon:/var/lib/usbmuxd:/sbin/nologin
uucp:x:10:14:Unix-to-Unix CoPy system:/etc/uucp:/bin/bash
vnc:x:492:491:user for VNC:/var/lib/empty:/sbin/nologin
wwwrun:x:30:8:WWW daemon apache:/var/lib/wwwrun:/bin/false
edward:x:1000:100:Edward:/home/edward:/bin/zsh
ftp-edward:x:1001:100::/home/ftp-edward:/bin/bash
lighthttpd:x:1004:1000::/home/lighthttpd:/bin/bash
user count is 33
接下來統(tǒng)計一個文件夾下文件占用的字節(jié)總數(shù)。
ls -l | awk 'BEGIN {
size = 0;
printf("[start]Initial Size is %s\n",size);
}{
print $5;
size = size + $5;
}
END {
printf("[end]Final Size is %s\n",size);
}'
[start]Initial Size is 0
2713
0
472
244
58464
0
0
0
26
0
0
31729
0
49548
46
49548
47650
11
[end]Final Size is 240451
若要以M顯示。
ls -l | awk 'BEGIN {
size = 0;
printf("[start]Initial Size is %s\n",size);
}{
print $5;
size = size + $5;
}
END {
printf("[end]Final Size is %sM\n",size/1024/1024);
}'
[start]Initial Size is 0
2713
0
472
244
58464
0
0
0
26
0
0
31729
0
49548
46
49548
47650
11
[end]Final Size is 0.229312M
條件語句
if (expression) {
statement;
statement;
... ...
}
if (expression) {
statement;
} else {
statement2;
}
if (expression) {
statement1;
} else if (expression1) {
statement2;
} else {
statement3;
}
## 循環(huán)語句
循環(huán)語句也差不多的
## 數(shù)組
因為awk中數(shù)組的下標可以是數(shù)字和字母,數(shù)組的下標通常被稱為關(guān)鍵字(key)。值和關(guān)鍵字都存儲在內(nèi)部的一張針對key/value應用hash的表格里。由于hash不是順序存儲,因此在顯示數(shù)組內(nèi)容時會發(fā)現(xiàn),它們并不是按照你預料的順序顯示出來的。數(shù)組和變量一樣,都是在使用時自動創(chuàng)建的,awk也同樣會自動判斷其存儲的是數(shù)字還是字符串。一般而言,awk中的數(shù)組用來從記錄中**收集信息**,可以用于**計算總和**、**統(tǒng)計單詞**以及**跟蹤模板被匹配的次數(shù)**等等。