Regular Expression

I spend 3 hours to learn about regular expression. The lesson comes from The Linux Command Line, that is one open source e-book about the Linux bash command.To demo the regular expressions, this book use the grep command:

The name grep is actually derived from the phrase "global regular expression print".

metacharacters

Regular expression metacharacters consist of the following:
^ $ . [] {} - ? * + () | \

POSIX Character Classes

Basically , we need to understand a little about the characters code history:

Back when Unix was first developed, it only knew about ASCII characters, and this fea- ture reflects that fact. In ASCII, the first 32 characters (numbers 0-31) are control codes (things like tabs, backspaces, and carriage returns). The next 32 (32-63) contain printable characters, including most punctuation characters and the numerals zero through nine. The next 32 (numbers 64-95) contain the uppercase letters and a few more punctuation symbols. The final 31 (numbers 96-127) contain the lowercase letters and yet more punc- tuation symbols. Based on this arrangement, systems using ASCII used a collation order that looked like this:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
This differs from proper dictionary order, which is like this:
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
As the popularity of Unix spread beyond the United States, there grew a need to support characters not found in U.S. English. The ASCII table was expanded to use a full eight bits, adding characters numbers 128-255, which accommodated many more languages. To support this ability, the POSIX standards introduced a concept called a locale, which could be adjusted to select the character set needed for a particular location.

We also need to pay attention to the difference between pathname expansion and regular expression, but POSIX characters classes can be used for both.

BRE and ERE

BRE : basic regular expressions, following metacharacters are recognized:
^ $ . [] *
ERE: extended regular expression, besides the BRE metacharacters , the following metacharacters(AND THEIR ASSOCIATED FUNCTIONS) are ADDED:
() {} ? + |

The “(”, “)”, “{”, and “}” characters are treated as metacharacters in BRE if they are escaped with a backslash, whereas with ERE, preced- ing any metacharacter with a backslash causes it to be treated as a literal.

At last, what is the means to POSIX? POSIX is Portable Operating System Interface (with the "X" added to the end for extra snappiness).

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容