參考文章:
https://blog.csdn.net/herecles/article/details/8152054
https://www.cnblogs.com/standby/p/8309994.html
示例的文本如下:
cat words.txt
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
1.利用AWK來統(tǒng)計詞頻
cat words.txt | awk '{for(i=1;i<=NF;i++){if($i ~ /\w/) valid++;\
count[$i]++}}END{print "valid words:"valid"\n";for(j in count)\
print j,count[j]}'
# 加了if去篩選“單詞”字符,但是結(jié)果不理想
#在END中,利用for將hash count中的數(shù)據(jù)輸出。
valid words:143#利用perl語言進行分析是,顯然不是這樣的,不知道哪里出了問題
-- 1
19
hard 1
1unts.
one 2
only 1
is 10
it 1
If 2
1nse.
special 1
aren't 1
are 1
ambiguity, 1
honking 1
Readability 1
way 2
of 3
In 1
1w.
easy 1
one-- 1
than 8
Special 1
*right* 1
refuse 1
preferably 1
that 1
be 3
Errors 1
Sparse 1
Complex 1
explain, 2
1ver.
1tch.
1rity.
bad 1
you're 1
Beautiful 1
There 1
1sted.
do 2
Unless 1
by 1
cases 1
better 8
Now 1
Explicit 1
face 1
often 1
unless 1
not 1
more 1
a 2
1ters
implementation 2
Tim 1
obvious 1
Although 3
let's 1
1.
1lently.
practicality 1
Namespaces 1
should 2
1mplex.
those! 1
great 1
2ea.
it's 1
Simple 1
1les.
enough 1
idea 1
explicitly 1
1lenced.
pass 1
Zen 1
2.利用perl來統(tǒng)計詞頻
perl語言此次處理起來似乎更勝一籌,但是這里有個點我琢磨很久,因為使用了2個perl語句,但是2個perl語句的作用不太一樣,不能放在一個loop下執(zhí)行,其中第一個語句是利用-alne(相當(dāng)于while<>)將words中的單詞進行遍歷,完了之后需要結(jié)束循環(huán);第二個perl語句不需要-alne,只是通過foreach語句進行hash count的打印,故而需加上END語句進行操作
cat words.txt|perl -alne '{foreach(split){$total++;next if /\W/;\
$valid++;$count{$_}++;}}' -e 'END{print"total:$total words,\
valid:$valid words\n";foreach $word (sort keys %count)\
{print " $word ==> $count{$word}\n"}}'
total:144 words,valid:113 words
Although ==> 3
Beautiful ==> 1
Complex ==> 1
Errors ==> 1
Explicit ==> 1
Flat ==> 1
If ==> 2
There ==> 1
Tim ==> 1
Unless ==> 1
Zen ==> 1
a ==> 2
and ==> 1
are ==> 1
at ==> 1
bad ==> 1
enough ==> 1
explicitly ==> 1
face ==> 1
first ==> 1
good ==> 1
great ==> 1
hard ==> 1
honking ==> 1
idea ==> 1
implementation ==> 2
is ==> 10
it ==> 1
may ==> 2
more ==> 1
never ==> 2
not ==> 1
obvious ==> 1
of ==> 3
often ==> 1
one ==> 2
only ==> 1
pass ==> 1
practicality ==> 1
preferably ==> 1
refuse ==> 1