編號(hào)
正則表達(dá)式((A)(B(C)))中有4個(gè)capturing group,它們分別是:
- ((A)(B(C)))
- (A)
- (B(C))
- (C)
編號(hào)其實(shí)很簡(jiǎn)單,從左向右找左括號(hào)就行。調(diào)用Matcher對(duì)象中的groupCount方法會(huì)返回該正則表達(dá)中capturing group的數(shù)量。
其實(shí)還有一個(gè)特殊的group,group 0,它表示整個(gè)正則表達(dá)式,它不被groupCount計(jì)算在內(nèi)。
知道group的編號(hào)是很重要的,因?yàn)镸atcher中很多方法接收group編號(hào)作為形參,比如:
-
public int start(int group): Returns the start index of the subsequence captured by the given group during the previous match operation. -
public int end (int group): Returns the index of the last character, plus one, of the subsequence captured by the given group during the previous match operation. -
public String group (int group): Returns the input subsequence captured by the given group during the previous match operation.
比如我們的正則表示是(面積:)(\d)+,我們想得到面積是多少:

但是這個(gè)正則表達(dá)匹配到的是“面積:123”,但是我們只想要“123”這個(gè)數(shù)字,這個(gè)時(shí)候我們就可以通過調(diào)用group(2)來直接得到這個(gè)數(shù)字。
反向引用(backreference)
我們先來看一個(gè)場(chǎng)景,再來解釋什么是反向引用。我們想匹配像1212,2323這樣數(shù)字,即第一位和第三位一樣,第二位和第四位一樣。用我們前面學(xué)的知識(shí)好像沒法解決。這個(gè)時(shí)候反向引用就派上用場(chǎng)了:
The section of the input string matching the capturing group(s) is saved in memory for later recall via backreference. A backreference is specified in the regular expression as a backslash (\) followed by a digit indicating the number of the group to be recalled.
我們來看一下如何用反向應(yīng)用解決我們的問題:

其中\(zhòng)1就表示group 1匹配到的內(nèi)容。