原文地址: Regular Expression，你喜歡閱讀原汁原味的，請閱讀原文。本文只做學(xué)習(xí)之用。

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

最近一直忙于crystal-mode的擴展與維護工作，說實話，真不容易，特別是正則表達式部分，太難閱讀。所以我準(zhǔn)備把之前的正則表達式，改造成更可讀的 s-expression 形式的，這樣的話，維護起來就能簡單很多。 rx 宏更好能很好的滿足要實現(xiàn)的目標(biāo)。

(require 's)  ;; All we need is =s-matches-p=
(require 'rx)

;; Creating a regexp that will match -> <File> [<Line>:<Column] <Suggestion>
(setq this-file-name "regular-expression.org")

(s-matches-p
 (rx bol
     (eval this-file-name)
     space
     "[" (group (one-or-more digit)) ":" (group (one-or-more digit)) "]"
     space
     (group (zero-or-more anything))
     eol)
 "blog.org [17:16] Emacs Lisp, not emacs lisp")

;; Produced regexp, I do not want to write or maintain this by hand
"^blog\\.org[[:space:]]\\[\\([[:digit:]]+\\):\\([[:digit:]]+\\)][[:space:]]\\(\\(?:.\\|
\\)*\\)$"

雖然不那么簡潔，但上面的示例很好的說明了在更高抽象等級下編寫正則表達式的優(yōu)點：更易于理解，寫起來更舒適，更容易維護。同時，使用符號表達式的形式更符合emacs的氣質(zhì)。

Strings And Quoting<a id="sec-1"></a>

STRING
     matches string STRING literally.

CHAR
     matches character CHAR literally.

‘(eval FORM)’
     evaluate FORM and insert result.  If result is a string,
     ‘regexp-quote’ it.

問題：什么樣的正則表達式匹配這個字符串： ASCII表中的標(biāo)點字符: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}

;; Escape the double quote here
(setq input "The punctuation characters in the ASCII table are: !\"#$%&'()*+,-./:;<=>?@[\]^_`{|}")

(s-matches-p (rx "The punctuation characters in the ASCII table are: !\"#$%&'()*+,-./:;<=>?@[\]^_`{|}")
             input) ;; Direct use of strings

(not (s-matches-p input input)) ;; Does not work because of quoting
(s-matches-p (regexp-quote input) input)

(s-matches-p (rx (eval input)) input) ;; More rx

如果你很清楚(正則表達式)語法字符的話, 可以很容易的看出，這個問題只是由引用或轉(zhuǎn)義語法字符引起的。函數(shù) regexp-quote 可以轉(zhuǎn)義這些字符，這很簡單。 rx 默認(rèn)轉(zhuǎn)義，可以直接傳入字符串。最后，可通過 eval 語法來使用字符串變量，來完成轉(zhuǎn)義。

Variables And Ranges<a id="sec-2"></a>

    ‘(any SET ...)’
    ‘(in SET ...)’
    ‘(char SET ...)’
         matches any character in SET ....  SET may be a character or string.
         Ranges of characters can be specified as ‘A-Z’ in strings.
         Ranges may also be specified as conses like ‘(?A . ?Z)’.

         SET may also be the name of a character class: ‘digit’,
         ‘control’, ‘hex-digit’, ‘blank’, ‘graph’, ‘print’, ‘a(chǎn)lnum’,
         ‘a(chǎn)lpha’, ‘a(chǎn)scii’, ‘nonascii’, ‘lower’, ‘punct’, ‘space’, ‘upper’,
         ‘word’, or one of their synonyms.

問題：創(chuàng)建一個正則表達式來匹配 calendar 的所有常見拼寫錯誤，這樣就可在文檔中找到這個詞，從而不必來考驗寫作者的拼寫能力。允許在每個元音位置使用a或e。

(s-matches-p (rx "c"
                 (any "a" "e")
                 "l"
                 (any "a" "e")
                 "nd"
                 (any "a" "e")
                 "r")
             "celander")

(setq misspelling-pattern `(any "a" "e"))

(s-matches-p (rx "c"
                 (eval misspelling-pattern)
                 "l"
                 (eval misspelling-pattern)
                 "nd"
                 (eval misspelling-pattern)
                 "r")
             "calendar")

"c[ae]l[ae]nd[ae]r" ;; Generated pattern

除了演示一個簡單的范圍構(gòu)造，通過熟悉的 eval 使用子模式允許更加模塊化地處理這些表達式，這有助于擺脫單一的連接字符串。

問題：創(chuàng)建一個正則表達式來匹配單個十六進制字符。

(s-matches-p (rx (any "a-f" "A-F" "0-9"))
             "A")
(s-matches-p (rx (in "a-f" "A-F" "0-9"))
             "A") ;; Equivalently

"[0-9A-Fa-f]" ;; Generated pattern


(s-matches-p (rx (char hex-digit))
             "d") ;; More rx
(s-matches-p (rx hex-digit)
             "d") ;; Equivalently

"[[:xdigit:]]" ;; Generated pattern

最后，范圍語法允許熟悉的破折號來表示字符范圍。 Rather, the abstraction of special character ranges like [:upper:] or [:xdigit:] is nice to know. Other useful constructs such as word-start, line-end, and punctuation exist that is worthy to be explored.

Alternatives And Depth<a id="sec-3"></a>

    ‘(or SEXP1 SEXP2 ...)’
    ‘(| SEXP1 SEXP2 ...)’
         matches anything that matches SEXP1 or SEXP2, etc.  If all
         args are strings, use ‘regexp-opt’ to optimize the resulting
         regular expression.

    ‘(zero-or-one SEXP ...)’
    ‘(optional SEXP ...)’
    ‘(opt SEXP ...)’
         matches zero or one occurrences of A.

    ‘(and SEXP1 SEXP2 ...)’
    ‘(: SEXP1 SEXP2 ...)’
    ‘(seq SEXP1 SEXP2 ...)’
    ‘(sequence SEXP1 SEXP2 ...)’
         matches what SEXP1 matches, followed by what SEXP2 matches, etc.

    ‘(repeat N SEXP)’
    ‘(= N SEXP ...)’
         matches N occurrences.

問題：創(chuàng)建一個正則表達式，當(dāng)重復(fù)應(yīng)用于文本 Mary, Jane, and Sue went to Mary's house 會匹配 Mary, Jane, Sue 然后再次匹配 Mary 。

(s-match-strings-all
 (rx (or "Mary" "Jane" "Sue"))
 "Mary, Jane, and Sue went to Mary's house")

;; Output
'(("Mary") ("Jane") ("Sue") ("Mary"))

;; Generated pattern
"\\(?:Jane\\|Mary\\|Sue\\)"

這個簡單的問題是使用與范圍和類有關(guān)的交替構(gòu)造的示例。沒有什么花哨的東西，但存在使其細(xì)微差別的可能性。

問題：創(chuàng)建一個匹配0到255的正則表達式。

(setq range-expression ;; Expression and pattern separated for reuse
      `(or "0"
           (sequence "1" (optional digit (optional digit)))
           (sequence "2" (optional
                          (or
                           (sequence (any "0-4") (optional digit))
                           (sequence "5" (optional (any "0-5")))
                           (sequence (any "6-9") (optional digit)))))
           (sequence (any "3-9") (optional digit))))

(setq range-pattern (rx (eval range-expression)))

;; A test for the regular expression
(require 'cl)
(cl-every (lambda (number)
            (s-matches-p range-pattern (number-to-string number)))
          (number-sequence 0 255))

;; Generated pattern
"0\\|1\\(?:[[:digit:]][[:digit:]]?\\)?\\|2\\(?:[0-4][[:digit:]]?\\|5[0-5]?\\|[6-9][[:digit:]]?\\)?\\|[3-9][[:digit:]]?"

;; To use this IP Addresses
(setq ip4-pattern (rx (repeat 3 (sequence (eval range-expression) "."))
                      (eval range-expression)))

;; Testing for permutation might take too long, one is good enough
(s-matches-p ip4-pattern
             "61.12.234.251")

;; Generated pattern
"\\(?:\\(?:0\\|1\\(?:[[:digit:]][[:digit:]]?\\)?\\|2\\(?:[0-4][[:digit:]]?\\|5[0-5]?\\|[6-9][[:digit:]]?\\)?\\|[3-9][[:digit:]]?\\)\\.\\)\\{3\\}\\(?:0\\|1\\(?:[[:digit:]][[:digit:]]?\\)?\\|2\\(?:[0-4][[:digit:]]?\\|5[0-5]?\\|[6-9][[:digit:]]?\\)?\\|[3-9][[:digit:]]?\\)"

上面的 range-expression 有問題，下面給出改正和測試如下:

(setq range-expression ;; Expression and pattern separated for reuse
      `(or "0"
           (sequence "1" (optional digit (optional digit)))
           (sequence "2" (optional
                          (or
                           (sequence (any "0-4") (optional digit))
                           (sequence "5" (optional (any "0-5")))
                           (optional digit))))
           (sequence (any "3-9") (optional digit))))

(setq range-pattern (rx bol (eval range-expression) eol ))

;; A test for the regular expression
(require 'cl)
(cl-every (lambda (number)
            (s-matches-p range-pattern (number-to-string number)))
          (number-sequence 0 255))

(cl-every (lambda (number)
            (not (s-matches-p range-pattern (number-to-string number))))
          (number-sequence 256 355))

這個表達的想法是匹配第一個數(shù)字，然后考慮分支。即使不深入解釋，語法應(yīng)該是有幫助的; 但三個新的結(jié)構(gòu)值得好好說明下。首先， optional 或 opt 語法與 zero-or-one 結(jié)構(gòu)等價。其次， sequence 或 seq 語法主要是一個表達式包裝器，其中列表不是一個原子是必需的。第三， repeat 語法與先前模式的重復(fù)構(gòu)造相同。不管新的語法如何，問題只是在展示語法。

另外，請記住為正則表達式編寫測試。

在我忘記之前， eval 要求變量存在于解釋器中; 這意味著，它們必須在使用之前通過 setq 進行全局設(shè)置。這就是為什么在片段中的兩個 setters 分別設(shè)置表達和模式的原因。建議通過 defconst 或 defvar 設(shè)置表達式或模式作為重構(gòu)。不幸的是， let 不能與 eval 一起工作，但這不是一項巨大的成本。

Groups And Backreferencs<a id="sec-4"></a>

    ‘(submatch SEXP1 SEXP2 ...)’
    ‘(group SEXP1 SEXP2 ...)’
         like ‘a(chǎn)nd’, but makes the match accessible with ‘match-end’,
         ‘match-beginning’, and ‘match-string’.

    ‘(submatch-n N SEXP1 SEXP2 ...)’
    ‘(group-n N SEXP1 SEXP2 ...)’
         like ‘group’, but make it an explicitly-numbered group with
         group number N.

問題：創(chuàng)建一個正則表達式，以yyyy-mm-dd格式匹配任何日期，并分別捕獲年，月和日。作為額外的挑戰(zhàn)，請將組命名。

(setq date-pattern
      (rx (group-n 3 (repeat 4 digit))
          "-"
          (group-n 2 (repeat 2 digit))
          "-"
          (group-n 1 (repeat 2 digit))))

(s-match-strings-all date-pattern
                     (format-time-string "%F"))

;; Output and pattern, notice it is day, month and year or reverse order
"\\(?3:[[:digit:]]\\{4\\}\\)-\\(?2:[[:digit:]]\\{2\\}\\)-\\(?1:[[:digit:]]\\{2\\}\\)"
'(("2017-03-30" "30" "03" "2017"))

捕獲 group 是本質(zhì)的, 這是語法起作用的地方。命名 group 在這里是不可能的，相反，僅限于編號 group 。需要注意的，這不是宏的限制，而是 Emacs Lisp 正則表達式語法的限制。

group-n 或 group 語法在意圖上很明顯。第一個參數(shù)代表組號，其余的是實際的表達式。沒有什么花哨。

問題：創(chuàng)建一個正則表達式，以yyyy-mm-dd格式匹配“神奇”日期。如果年份減去世紀(jì)，月份和月份的日期都是相同的數(shù)字，則日期是神奇的。例如，2008-08-08是一個神奇的約會。

(setq magical-pattern
      (rx
       (repeat 2 digit)
       (group-n 1 (repeat 2 digit))
       "-"
       (backref 1)
       "-"
       (backref 1)))

(s-matches-p magical-pattern
             "2008-08-08")

;; Generated pattern
"[[:digit:]]\\{2\\}\\(?1:[[:digit:]]\\{2\\}\\)-\\1-\\1"

這只是顯示反向引用可用。 backref 語法只是用數(shù)字參數(shù)調(diào)用組。再一次，沒什么復(fù)雜的。

re-builder<a id="sec-5"></a>

為了更好的檢測編寫的正則表達式， Emacs 中存在用于測試和試驗正則表達式的用戶界面： re-builder 。在包含文本的緩沖區(qū)上執(zhí)行命令 re-builder 或 regexp-builder ，然后執(zhí)行 reb-change-syntax 并選擇 rx 。如下圖所示。 [站外圖片上傳中...(image-9a45fe-1519915620386)]

這個UI可以處理原始表達式，但我們這里只對rx感興趣。詳細(xì)說明，每次表達式更新時，它都會突出顯示任何可能的匹配項。雖然它不像動態(tài)或程序化，但它作為一個快速實驗和檢查很方便。

總結(jié)<a id="sec-6"></a>

rx 宏不能作為學(xué)習(xí)正則表達式的替代品，因為它構(gòu)建的DSL不能完全覆蓋所有的細(xì)節(jié)。但是寫原始的正則表達式，真是很痛苦，所以使用 rx 宏可在更高的抽象等級上來構(gòu)造正則表達式，可在更清晰的語義下，構(gòu)造正則表達式。上面沒有給出所有的正則語法構(gòu)造，只給出一些常用的特征，有任何疑惑，請直接閱讀函數(shù)文檔。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

正則表達式--探索rx宏

正則表達式--探索rx宏

Strings And Quoting<a id="sec-1"></a>

Variables And Ranges<a id="sec-2"></a>

Alternatives And Depth<a id="sec-3"></a>

Groups And Backreferencs<a id="sec-4"></a>

re-builder<a id="sec-5"></a>

總結(jié)<a id="sec-6"></a>

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

正則表達式--探索rx宏

Strings And Quoting<a id="sec-1"></a>

Variables And Ranges<a id="sec-2"></a>

Alternatives And Depth<a id="sec-3"></a>

Groups And Backreferencs<a id="sec-4"></a>

re-builder<a id="sec-5"></a>

總結(jié)<a id="sec-6"></a>

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av