Translate from Efficient Counter in Java

Java中的高效計(jì)數(shù)器

你可能經(jīng)常需要統(tǒng)計(jì)一段文本或數(shù)據(jù)庫中某些東西（例如單詞）的出現(xiàn)頻率。在Java中，使用HashMap可以很簡單的實(shí)現(xiàn)這么一個(gè)計(jì)數(shù)器。
這篇文章將會(huì)比較幾種實(shí)現(xiàn)計(jì)數(shù)器的方法，最后，得出最有效率的一個(gè)。

1. 簡單計(jì)數(shù)器

String s = "one two three two three three";
String[] sArr = s.split(" ");
 
//naive approach     
HashMap<String, Integer> counter = new HashMap<String, Integer>();
 
for (String a : sArr) {
    if (counter.containsKey(a)) {
        int oldValue = counter.get(a);
        counter.put(a, oldValue + 1);
    } else {
        counter.put(a, 1);
    }
}

每次循環(huán)，檢查map中是否已經(jīng)有該字符，如果有則value+1；如果沒有，則放入map,把value置為1；
這種方法很簡單，但是效率不夠，體現(xiàn)在以下兩點(diǎn)：

當(dāng)key值已經(jīng)存在時(shí)，調(diào)用了兩次方法：containsKey()、get()，這意味著要在map中檢索兩次
由于Integer是不可變的，每個(gè)循環(huán)增加key的value時(shí)會(huì)創(chuàng)建一個(gè)新的對象

2. 優(yōu)化計(jì)數(shù)器

為了避免Integer的“不可變”帶來的重新創(chuàng)建對象的問題，我們創(chuàng)建一個(gè)“可變”的Integer對象：

class MutableInteger {
 
    private int val;
 
    public MutableInteger(int val) {
        this.val = val;
    }
 
    public int get() {
        return val;
    }
 
    public void set(int val) {
        this.val = val;
    }
 
    //used to print value convinently
    public String toString(){
        return Integer.toString(val);
    }
}

通過這個(gè)可變的Integer對象，改進(jìn)后的計(jì)數(shù)器代碼如下：

HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>(); 
 
for (String a : sArr) {
    if (newCounter.containsKey(a)) {
        MutableInteger oldValue = newCounter.get(a);
        oldValue.set(oldValue.get() + 1);
    } else {
        newCounter.put(a, new MutableInteger(1));
    }
}

優(yōu)化了兩個(gè)地方：

每次value+1時(shí)不需要重新創(chuàng)建Integer對象了
改變value值不需要重新put()進(jìn)去

但是，這個(gè)方法仍然進(jìn)行了兩次檢索

3. 高效計(jì)數(shù)器

由于 HashMap.put(key, value)方法會(huì)返回這個(gè)key值原有的value，我們可以利用這一點(diǎn)避免兩次檢索
（譯者注：但是每次循環(huán)又增加了一個(gè)對象創(chuàng)建的開銷）

HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>();
 
for (String a : sArr) {
    MutableInteger initValue = new MutableInteger(1);
    MutableInteger oldValue = efficientCounter.put(a, initValue);
 
    if(oldValue != null){
        initValue.set(oldValue.get() + 1);
    }
}

4. 三種方法的效率表現(xiàn)

為了體現(xiàn)三種方法的效率，通過代碼循環(huán)了一百萬次進(jìn)行測試，結(jié)果如下：

Naive Approach : 222796000
Better Approach: 117283000
Efficient Approach: 96374000

結(jié)果可以近似表達(dá)為：223:117:96，簡單計(jì)數(shù)器和優(yōu)良計(jì)數(shù)器之間的差距很大，這證明了創(chuàng)建對象的時(shí)間花費(fèi)很大！

String s = "one two three two three three";
String[] sArr = s.split(" ");
 
long startTime = 0;
long endTime = 0;
long duration = 0;
 
// naive approach
startTime = System.nanoTime();
HashMap<String, Integer> counter = new HashMap<String, Integer>();
 
for (int i = 0; i < 1000000; i++)
    for (String a : sArr) {
        if (counter.containsKey(a)) {
            int oldValue = counter.get(a);
            counter.put(a, oldValue + 1);
        } else {
            counter.put(a, 1);
        }
    }
 
endTime = System.nanoTime();
duration = endTime - startTime;
System.out.println("Naive Approach :  " + duration);
 
// better approach
startTime = System.nanoTime();
HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>();
 
for (int i = 0; i < 1000000; i++)
    for (String a : sArr) {
        if (newCounter.containsKey(a)) {
            MutableInteger oldValue = newCounter.get(a);
            oldValue.set(oldValue.get() + 1);
        } else {
            newCounter.put(a, new MutableInteger(1));
        }
    }
 
endTime = System.nanoTime();
duration = endTime - startTime;
System.out.println("Better Approach:  " + duration);
 
// efficient approach
startTime = System.nanoTime();
 
HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>();
 
for (int i = 0; i < 1000000; i++)
    for (String a : sArr) {
        MutableInteger initValue = new MutableInteger(1);
        MutableInteger oldValue = efficientCounter.put(a, initValue);
 
        if (oldValue != null) {
            initValue.set(oldValue.get() + 1);
        }
    }
 
endTime = System.nanoTime();
duration = endTime - startTime;
System.out.println("Efficient Approach:  " + duration);

5. Keith網(wǎng)站上的更好方法

增加了三個(gè)測試：

重構(gòu)優(yōu)良計(jì)數(shù)器，用get方法取代了containKey，這樣HashMap里面的兩次檢索減成了一個(gè)
取消自定義可變類型的Integer類，用Java中的AtomicInteger
根據(jù)How to Do Text Analysis with Java 一書，value改為用單個(gè)int數(shù)組存儲(chǔ)，可使用更少內(nèi)存。

我運(yùn)行這個(gè)改進(jìn)測試案例三次，取結(jié)果中最小的值來避免其他程序的影響。注意，你不能讓你的程序受到太多影響干擾。例如內(nèi)存不足會(huì)讓垃圾回收機(jī)制器進(jìn)行回收。

Naive: 201716122
Better Approach: 112259166
Efficient Approach: 93066471
Better Approach (without containsKey): 69578496
Better Approach (without containsKey, with AtomicInteger): 94313287
Better Approach (without containsKey, with int[]): 65877234

以下是代碼
更好的方法（取消使用containsKey）:

HashMap<String, MutableInteger> efficientCounter2 = new HashMap<String, MutableInteger>();
for (int i = 0; i < NUM_ITERATIONS; i++) {
    for (String a : sArr) {
        MutableInteger value = efficientCounter2.get(a);
 
        if (value != null) {
            value.set(value.get() + 1);
        } else {
            efficientCounter2.put(a, new MutableInteger(1));
        }
    }
}

更好的方法（取消使用containsKey，用AtomicInteger）:

HashMap<String, AtomicInteger> atomicCounter = new HashMap<String, AtomicInteger>();
for (int i = 0; i < NUM_ITERATIONS; i++) {
    for (String a : sArr) {
        AtomicInteger value = atomicCounter.get(a);
 
        if (value != null) {
            value.incrementAndGet();
        } else {
            atomicCounter.put(a, new AtomicInteger(1));
        }
    }
}

更好的方法（取消使用containsKey，用int[]）:

HashMap<String, int[]> intCounter = new HashMap<String, int[]>();
for (int i = 0; i < NUM_ITERATIONS; i++) {
    for (String a : sArr) {
        int[] valueWrapper = intCounter.get(a);
 
        if (valueWrapper == null) {
            intCounter.put(a, new int[] { 1 });
        } else {
            valueWrapper[0]++;
        }
    }
}

6. 結(jié)論

efficient-counter.png

從圖可以看出，最終勝利者是最后一個(gè)使用int[]數(shù)組的方法

參考：
1.Most efficient way to increment a Map value in Java.
2.[HashMap.put()](http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html#put(K, V))

7. 譯者補(bǔ)充

這里進(jìn)一步對高效計(jì)數(shù)器和改進(jìn)版的高效計(jì)數(shù)器進(jìn)行一下對比
測試字符串"one two three two three three"，都采用int[]數(shù)組存儲(chǔ)

原本高效計(jì)數(shù)器：去除了兩次檢索，但每次循環(huán)都聲明了一個(gè)新對象

for (String string : sArray) {
    int[] n=new int[]{1};
    int[] num = map.put(string,n);
    if (num != null)
        n[0]=++num[0];
}

網(wǎng)站改進(jìn)版：需要一次檢索，但沒有聲明多余的對象

for (String  key: sArray) {
    int[] n=map.get(key);
    if(n==null)
        map.put(key, new int[]{1});
    else
        n[0]++;
}

三次運(yùn)行得出最終耗時(shí)結(jié)果

48396425
26244898

在這個(gè)例子我們似乎可以看出，依舊是改進(jìn)版的計(jì)數(shù)器更高效。
但如果把字符串延長10倍，再次運(yùn)行

21581658187
22019343923

此時(shí)兩種方法時(shí)間花費(fèi)特別接近，如果把字符串繼續(xù)延長，改進(jìn)版的效率則不一定比原來的高。

所以這里得出我的結(jié)論：
沒有最高效的計(jì)數(shù)器，只有最合適的計(jì)數(shù)器。在字符串比較小的情況下，本文末尾結(jié)論的計(jì)數(shù)器最高效。但字符串增加， HashMap檢索代價(jià)增加的情況下，則第一種高效計(jì)數(shù)器更好。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Simple Java—Collections（一）Java高效計(jì)數(shù)器

Simple Java—Collections（一）Java高效計(jì)數(shù)器

Java中的高效計(jì)數(shù)器

1. 簡單計(jì)數(shù)器

2. 優(yōu)化計(jì)數(shù)器

3. 高效計(jì)數(shù)器

4. 三種方法的效率表現(xiàn)

5. Keith網(wǎng)站上的更好方法

6. 結(jié)論

7. 譯者補(bǔ)充

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Simple Java—Collections（一）Java高效計(jì)數(shù)器

Java中的高效計(jì)數(shù)器

1. 簡單計(jì)數(shù)器

2. 優(yōu)化計(jì)數(shù)器

3. 高效計(jì)數(shù)器

4. 三種方法的效率表現(xiàn)

5. Keith網(wǎng)站上的更好方法

6. 結(jié)論

7. 譯者補(bǔ)充

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av