grep()能對(duì)向量中特定條件的元素進(jìn)行查詢,默認(rèn)return為index。grep()語法與grep()大致相似,但默認(rèn)return為logical。
grep()
代碼如下:
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
grep()函數(shù)參數(shù):
| 參數(shù) | 功能 |
|---|---|
| pattern | 包含正則表達(dá)式的字符串 |
| x | 尋找匹配的字符向量,或者可以通過字符向量強(qiáng)制轉(zhuǎn)換的對(duì)象。支持長(zhǎng)向量 |
| ignore.case | 如果為FALSE,則模式匹配區(qū)分大小寫;如果為TRUE,則在匹配期間忽略大小寫 |
| perl | 如果為TRUE,使用perl匹配的正則表達(dá)式 |
| value | 如果為FALSE,則返回包含由grep確定的匹配的索引的向量,如果為TRUE,則返回包含匹配元素本身的向量 |
| fixed | 如果為TRUE,則pattern是要按原樣匹配的字符串 |
| useBytes | 如果為TRUE,則匹配是逐字節(jié)而不是逐字符完成的 |
| invert | 如果為TRUE,則返回不匹配的元素的索引或值 |
R 語言中的正則表達(dá)式
| 正則表達(dá)式符號(hào) | 含義 |
|---|---|
| ^ | 匹配一個(gè)字符串的開始 |
| $ | 匹配一個(gè)字符串的結(jié)尾 |
| . | 匹配除了換行符以外的任一字符 |
| * | 匹配所有含有*后的字符 |
| ? | 匹配所有含有?后的字符 |
| + | 匹配所有含有+后的字符 |
| .* | 可以匹配任意字符 |
| | | 表示邏輯的或 |
| [^] | 表示邏輯的補(bǔ)集 |
| [] | 匹配多個(gè)字符,如果不使用任何分隔符號(hào),則搜尋這個(gè)集合 |
| [-] | 匹配一個(gè)范圍 |
貪婪和懶惰規(guī)則
默認(rèn)情況下是匹配盡可能多的字符,是為貪婪匹配,比如sub("a.b","",c("aabab","eabbe")),默認(rèn)匹配最長(zhǎng)的a開頭b結(jié)尾的字串,也就是整個(gè)字符串。如果要進(jìn)行懶惰匹配,也就是匹配最短的字串,只需要在后面加個(gè)“?”,比如sub("a.?b","",c("aabab","eabbe")),就會(huì)匹配最開始找到的最短的a開頭b結(jié)尾的字串。
grep()函數(shù)實(shí)例:
1. ^ 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1')
Results <- grep('^C', Protein, value = T)
Results

^ 的使用
2. $ 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1')
Results <- grep('2$', Protein, value = T)
Results

$ 的使用
3. . 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1')
Results <- grep('MCM.', Protein, value = T)
Results

. 的使用
4. * 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1')
Results <- grep('*2', Protein, value = T)
Results

* 的使用
5. ? 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('?D', Protein, value = T)
Results

? 的使用
6. + 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('+D', Protein, value = T)
Results

+ 的使用
7. .* 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('T.*3', Protein, value = T)
Results

.* 的使用
8. | 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('^T|*3', Protein, value = T)
Results

| 的使用
9. [^] 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('[^TP53]', Protein, value = T)
Results

[^] 的使用
10. [] 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('[4,3,9,6]', Protein, value = T)
Results

[] 的使用
11. [-] 的使用:
Protein <- c('TP53','GMPS','CAD','MCM2','MCM3','MCM4',
'MCM5','MCM6','MCM7','TGM1','TGM2','TGM3',
'TGM4','TGM5','TGM6','TGM7','CTPS1','CTPS2',
'GLS','GLS2','NADSYN1','DDB1','DDB2','DAO',
'DDO','DCLRE1C','DLC1','USP11')
Results <- grep('[1-3]', Protein, value = T)
Results

[-] 的使用