正則表達(dá)式(二)
re模塊(regex)
python中沒(méi)有正則表達(dá)式的函數(shù),需要引入內(nèi)置的re模塊
-
re模塊方法:
- match:從頭開(kāi)始匹配,只匹配一個(gè)結(jié)果
- 作用:接收用戶(hù)傳遞過(guò)來(lái)的url進(jìn)行匹配(web框架用到)
- search:整個(gè)字符串進(jìn)行匹配,只匹配第一個(gè)獲得的結(jié)果
- findall:匹配整個(gè)字符串,返回一個(gè)列表對(duì)象
- finditer:和findall非常類(lèi)似,返回一個(gè)可迭代的對(duì)象,迭代的時(shí)候才會(huì)創(chuàng)建值
- split:字符串切割
- sub:替換
- 最常用match和findall來(lái)查找
- match:從頭開(kāi)始匹配,只匹配一個(gè)結(jié)果
-
分組的作用:
- 在匹配得到的結(jié)果中獲取局部的值
-
re模塊查找分類(lèi)(match,search,findall):
- 普通匹配:
- 通過(guò)match和search匹配成功返回的對(duì)象中,group得到的匹配得到的完整結(jié)果
regex = 'H\w+' ret = re.match(regex, origin) print(ret.group()) # Hello print(ret.groups()) # () print(ret.groupdict()) # {}- 分組匹配:
- 分組來(lái)獲取局部的值
- match:從頭開(kāi)始匹配
origin = 'Hello tom, xxx tom, yyy tom haha 19' regex = '(H)(\w+)' ret = re.match(regex, origin) print(ret.group()) # Hello print(ret.groups()) # ('H', 'ello') print(ret.groupdict()) # {}origin = 'Hello tom, xxx tom, yyy tom haha 19' regex = '(?P<n1>H)(\w+)' ret = re.match(regex, origin) print(ret.group()) # Hello print(ret.groups()) # ('H', 'ello') print(ret.groupdict()) # {'n1': 'H'}search:知道了group方法之后,searh方法和match方法是一樣的,match是從頭開(kāi)始匹配,search是全局匹配
-
findall:返回匹配成功的列表
- 不分組:
regex = 'app\w+' origin = 'i have an apple, i have a pen, apple pen' ret = re.findall(regex, origin) print(ret) # ['apple', 'apple']- 分組:
- findall會(huì)把groups里面的所有東西放到列表里面(分組的作用:獲取局部的東西)
- findall對(duì)?P<key>不起作用
- 如果分組嵌套分組,則先取外面的,再取里面的
egex = '(app\w+)' origin = 'i have an apple, i have a pen, apple pen' ret = re.findall(regex, origin) print(ret) # ['apple', 'apple'] regex = '(app)\w+' origin = 'i have an apple, i have a pen, apple pen' ret = re.findall(regex, origin) print(ret) # ['app', 'app'] regex = '(app)(\w+)' origin = 'i have an apple, i have a pen, apple pen' ret = re.findall(regex, origin) print(ret) # [('app', 'le'), ('app', 'le')] regex = '((a)(pp))(\w+)' origin = 'i have an apple, i have a pen, apple pen' ret = re.findall(regex, origin) print(ret) # [('app', 'a', 'pp', 'le'), ('app', 'a', 'pp', 'le')] -
finditer:只有在迭代的時(shí)候才創(chuàng)建對(duì)象
- 迭代的對(duì)象跟match和search獲得的對(duì)象一樣
regex = '((a)(pp))(\w+)' origin = 'i have an apple, i have a pen, apple pen' ret1 = re.search(regex, origin) print('type of search', type(ret1)) # type of search <class '_sre.SRE_Match'> ret2 = re.finditer(regex,origin) for i in ret2: print('type of finditer', type(i)) # type of finditer <class '_sre.SRE_Match'> - 普通匹配:
-
分組注意事項(xiàng):
- 分組的是按照你真是寫(xiě)了多少個(gè)組來(lái)確定的,如果獲得的值多于寫(xiě)的組個(gè)數(shù),則去最后一個(gè)
regex = '(\w)(\w)(\w)(\w)' ret = re.findall(regex, 'tony') print(ret) # [('t', 'o', 'n', 'y')] regex = '(\w){4}' ret = re.findall(regex, 'tony') print(ret) # [y]- 第一個(gè)寫(xiě)了4個(gè)組,每個(gè)組一個(gè)字符:(\w)(\w)(\w)(\w),所以得到的結(jié)果是4個(gè)組
- 第二個(gè)寫(xiě)的是1個(gè)組,需要匹配4個(gè)字符,因?yàn)檎嬲龑?xiě)的只有一個(gè)組,所有得到的結(jié)果為1個(gè)組,理論上每個(gè)字符的組都是一樣的,最后的會(huì)覆蓋前面的,所以取得的是最后一個(gè)
ret = re.findall('\dyhh*', '1yhh2yhh3andyhh4yhh') print(ret) # ['1yhh', '2yhh', '4yhh'] ret = re.findall('(\dyhh)*', '1yhh2yhh3andyhh4yhh') print(ret) # ['2yhh', '', '', '', '', '', '', '', '4yhh', '']- 匹配時(shí)盡量避免正則表達(dá)式為空值
-
split:分割
- 無(wú)分組分割:分割后不包含自己(分隔符)
- 分組分割:分割后會(huì)保留分割符內(nèi)分組的內(nèi)容
regex = '\d+' origin = 'ad12ad134a34da' ret = re.split(regex, origin) print(ret) # ['ad', 'ad', 'a', 'da'] regex = '(\d+)' origin = 'ad12ad134a34da' ret = re.split(regex, origin, 1) print(ret) # ['ad', '12', 'ad134a34da']- 使用場(chǎng)景:計(jì)算器獲取最里層的括號(hào)同時(shí)取出括號(hào)
origin = '7+((3+4)*5-(4-9)*5)*4' regex = '\(([^()]+)\)' # 獲取最里層的括號(hào),同時(shí)對(duì)括號(hào)里的數(shù)據(jù)歸到組里,切割后保留括號(hào)內(nèi)的內(nèi)容 ret = re.split(regex, origin, maxsplit=1) print(ret) # ['7+(', '3+4', '*5-(4-9)*5)*4']- 再根據(jù)切割后得到的結(jié)果是否是長(zhǎng)度為3的列表,不斷循環(huán)計(jì)算其值
while True: ret = re.split(regex, origin, maxsplit=1) if len(ret) == 3: 計(jì)算得到括號(hào)的值,然后重新組合origin字符串 elif: break同時(shí)帶出結(jié)果 sub:替換 && subn:替換返回一個(gè)元組