正则表达式杂记

时间：2007-02-07 来源：wangjian98

下面正则表达式中的一些原字符： . 匹配任何单个字符； $ 匹配行结束符； ^ 匹配一行的开始； * 匹配零个或多个正好在它之前的那个字符，如正则表达式.*能够匹配任意数量的任何字符； \ 引用符，用来将这里列出的这些原字符当作普通字符来进行匹配，如\$只能匹配美元符号，而不是行尾，\.只能匹配点字符； [] 匹配括号中的任何一个字符； [c1-c2] 可以在括号中使用连字符“-”来指定字符区间，如[0-9]可以匹配任何数字字符；可以指定多个区间，如[A-Za-z]匹配任何大小写字母； [^c1-c2] 括号中的^表示“排除”，指排除了指定区间之外的字符，也就是指定区间的补集，如正则表达式[^269A-Z]匹配除了2、6、9和所有大写字母之外的任何字符； \< \>，\b \b 匹配词（word）的开始和结束，如\<the或\bthe匹配字符串“for the wise”中的“the”，但不匹配“otherwise”中的“the”；  将“$”和“$”之间的表达式定义为“组”（group），并且将匹配这个表达式的字符保存到一个临时区域，一个正则表达式中最多可以保存9个，使用\1和\9的符号来引用； | 将两个匹配条件进行逻辑“或”（Or）运算，如A|B表示与正则表达式A匹配或与B匹配的字符串； + 匹配一个或多个正好在它前面的字符； ? 匹配一个或零个正好在它前面的字符； \{i\} 匹配指定数目的字符，这些字符是在它之前的表达式定义的，如A[0-9]\{3\}匹配“A”后面跟着正好3个数字的字符串； \{i,j\} 指定匹配数目的区间为i个到j个，如[0-9]\{4,6\}匹配任意连续的4个、5个或六个数字字符。 from 《正则表达式之道》 {m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 "a" characters, while a{3,5}? will only match 3 characters. *?，+?，??
The "*", "+", and "?" qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.
在Python中使用正则表达式替换，首先要导入相应的模块，在Python中，所有和正则表达式相关的功能都包含在re模块中。如使用sub函数进行替换：

import re
re.sub(pattern, repl, string)

from 《Python Library Reference》－4.2.1 Regular Expression Syntax {m,n}?及Python中定义的大量special sequences consist of "\"在Emacs中似乎不支持，但re模块可以工作得很好。在Emacs中使用正则表达式时，以下几个命令比较常用：

替换：M-x replace-regexp
高亮：M-x highlight-regexp，快捷键:C-x w h
取消高亮：M-x unhighlight-regexp，快捷键：C-x w r