多語(yǔ)言環(huán)境下,統(tǒng)計(jì)字?jǐn)?shù)or單詞數(shù)的方法:
1、中文按照字符個(gè)數(shù)統(tǒng)計(jì)
中文正則表達(dá)式為"[\u4e00-\u9fa5]",用該正則統(tǒng)計(jì)出所有的中文字符即可
2、其他語(yǔ)言按照標(biāo)點(diǎn)、空格分隔統(tǒng)計(jì)單詞數(shù)
標(biāo)點(diǎn)符號(hào)空格正則表達(dá)式為"[\\p{P}\\p{S}\\p{Z}\\s]+",排除中文字符后按照該正則表達(dá)式分隔一下統(tǒng)計(jì)單詞數(shù)即可 ( Java正則原本不需要加{},即"[\\pP\\pS\\pZ\\s]+", Android不加則會(huì)拋異常 )
最終的統(tǒng)計(jì)方法:
public static int wordCount(String string) {
if (string == null) {
return 0;
}
String englishString = string.replaceAll("[\u4e00-\u9fa5]", "");
String[] englishWords = englishString.split("[\\p{P}\\p{S}\\p{Z}\\s]+");
int chineseWordCount = string.length() - englishString.length();
int otherWordCount = englishWords.length;
if (englishWords.length > 0 && englishWords[0].length() < 1) {
otherWordCount--;
}
if (englishWords.length > 1 && englishWords[englishWords.length - 1].length() < 1) {
otherWordCount--;
}
return chineseWordCount + otherWordCount;
}