正则表达式杂记
时间:2007-02-07 来源:wangjian98
下面正则表达式中的一些原字符:
.
匹配任何单个字符;
$
匹配行结束符;
^
匹配一行的开始;
*
匹配零个或多个正好在它之前的那个字符,如正则表达式.*能够匹配任意数量的任何字符;
\
引用符,用来将这里列出的这些原字符当作普通字符来进行匹配,如\$只能匹配美元符号,而不是行尾,\.只能匹配点字符;
[]
匹配括号中的任何一个字符;
[c1-c2]
可以在括号中使用连字符“-”来指定字符区间,如[0-9]可以匹配任何数字字符;可以指定多个区间,如[A-Za-z]匹配任何大小写字母;
[^c1-c2]
括号中的^表示“排除”,指排除了指定区间之外的字符,也就是指定区间的补集,如正则表达式[^269A-Z]匹配除了2、6、9和所有大写字母之外的任何字符;
\< \>,\b \b
匹配词(word)的开始和结束,如\<the或\bthe匹配字符串“for the wise”中的“the”,但不匹配“otherwise”中的“the”;
\( \)
将“\(”和“\)”之间的表达式定义为“组”(group),并且将匹配这个表达式的字符保存到一个临时区域,一个正则表达式中最多可以保存9个,使用\1和\9的符号来引用;
|
将两个匹配条件进行逻辑“或”(Or)运算,如A|B表示与正则表达式A匹配或与B匹配的字符串;
+
匹配一个或多个正好在它前面的字符;
?
匹配一个或零个正好在它前面的字符;
\{i\}
匹配指定数目的字符,这些字符是在它之前的表达式定义的,如A[0-9]\{3\}匹配“A”后面跟着正好3个数字的字符串;
\{i,j\}
指定匹配数目的区间为i个到j个,如[0-9]\{4,6\}匹配任意连续的4个、5个或六个数字字符。
from 《正则表达式之道》
{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 "a" characters, while a{3,5}? will only match 3 characters. *?,+?,??
The "*", "+", and "?" qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.
在Python中使用正则表达式替换,首先要导入相应的模块,在Python中,所有和正则表达式相关的功能都包含在re模块中。如使用sub函数进行替换:
from 《Python Library Reference》-4.2.1 Regular Expression Syntax
{m,n}?及Python中定义的大量special sequences consist of "\"在Emacs中似乎不支持,但re模块可以工作得很好。在Emacs中使用正则表达式时,以下几个命令比较常用:
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 "a" characters, while a{3,5}? will only match 3 characters. *?,+?,??
The "*", "+", and "?" qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.
在Python中使用正则表达式替换,首先要导入相应的模块,在Python中,所有和正则表达式相关的功能都包含在re模块中。如使用sub函数进行替换:
import re |
- 替换:M-x replace-regexp
- 高亮:M-x highlight-regexp,快捷键:C-x w h
- 取消高亮:M-x unhighlight-regexp,快捷键:C-x w r
相关阅读 更多 +