Php文档 Php问答行业资讯 Php论坛 Php手册 Php博客

游戏榜单

软件榜单

关闭导航

热搜榜

热门下载

热门标签

关闭搜索

php爱好者> php文档>Python 各种方面的技巧片段

Python 各种方面的技巧片段

时间：2009-05-25 来源：didonglin

转自：http://wiki.woodpecker.org.cn/moin/PyTips

1. 各种实用代码片段

1.1. 正则表达式使用

正在使用正则表达式，随手翻译了一正python的文档

::-- ZoomQuiet [2005-04-28 04:15:10]

日期: 2005-4-28 上午11:08 主题: [python-chinese] 正在使用正则表达式，随手翻 译了一正python的文档 回复 | 回复所有人 | 转发 | 打印 | 将发件人添加到通讯录 | 删除该邮件 | 这是网络欺诈 | 显示原始邮件 大部分与其它语言中的规则一致，但是也有部分不同的地方，手头有个工作要用到正则表达式，就随手翻译了一了python的帮助文档。组织的不是很正规。看懂是没有问题的。  ########################################################### 特殊字符: ###########################################################  "." 匹配除 "\n" 之外的任何单个字符。要匹配包括 '\n' 在内的任何字符，请使用象 '[.\n]' 的模式。  "^" 匹配输入字符串的开始位置。  "$" 匹配输入字符串的结束位置。  "*" 匹配前面的子表达式零次或多次。例如，zo* 能匹配 "z" 以及"zoo"。 * 等价于{0,}。 Greedy means 贪婪的  "+" 匹配前面的子表达式一次或多次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等价于 {1,}。  "?" 匹配前面的子表达式零次或一次(贪婪的)  *?,+?,?? 前面三个特殊字符的非贪婪版本  {m,n} 最少匹配 m 次且最多匹配 n 次(m 和 n 均为非负整数，其中m <= n。)  {m,n}? 上面表达式的非贪婪版本.  "\\" Either escapes special characters or signals a special sequence.  [] 表示一个字符集合，匹配所包含的任意一个字符  第一个字符是 "^" 代表这是一个补集  "|" A|B, 匹配 A 或 B中的任一个  (...) Matches the RE inside the parentheses（圆括号）.（匹配pattern 并获取这一匹配）  The contents can be retrieved（找回） or matched later in the string.  (?iLmsux) 设置 I, L, M, S, U, or X 标记 (见下面).  (?:...) 圆括号的非成组版本.  (?P<name>...) 被组（group）匹配的子串，可以通过名字访问  (?P=name) 匹配被组名先前匹配的文本（Matches the text matched earlier by the group named name.）  (?#...) 注释；被忽略.  (?=...) Matches if ... matches next, but doesn't consume the string（但是并不消灭这个字串.）  (?!...) Matches if ... doesn't match next.  The special sequences consist of "\\" and a character from the list below. If the ordinary character is not on the list, then the resulting RE will match the second character.  \number Matches the contents of the group of the same number.  \A Matches only at the start of the string.  \Z Matches only at the end of the string.  \b Matches the empty string, but only at the start or end of a word  匹配一个空串但只在一个单词的开始或者结束的地方.匹配单词的边界  \B 匹配一个空串, 但不是在在一个单词的开始或者结束的地方.（匹配非单词边界）  \d 匹配一个数字字符。等价于 [0-9]。  \D 匹配一个非数字字符。等价于 [^0-9]。  \s 匹配任何空白字符，包括空格、制表符、换页符等等。等价于[ \f\n\r\t\v]。  \S 匹配任何非空白字符。等价于 [^ \f\n\r\t\v]。  \w 匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'.  With LOCALE, it will match the set [0-9_] plus characters defined  as letters for the current locale.  \W 匹配\w的补集（匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'。）  \\ 匹配一个"\"(反斜杠)  ########################################################## 共有如下方法可以使用： ##########################################################  match 从一个字串的开始匹配一个正则表达式  search 搜索匹配正则表达式的一个字串  sub 替换在一个字串中发现的匹配模式的字串  subn 同sub，但是返回替换的个数  split 用出现的模式分割一个字串  findall Find all occurrences of a pattern in a string.  compile 把一个模式编译为一个RegexObject对像.  purge 清除正则表达式缓存  escape Backslash（反斜杠）all non-alphanumerics in a string.  Some of the functions in this module takes flags as optional parameters:  I IGNORECASE Perform case-insensitive matching.（执行大小写敏感的匹配）  L LOCALE Make \w, \W, \b, \B, dependent on the current locale.  M MULTILINE "^" matches the beginning of lines as well as the string.  "$" matches the end of lines as well as the string.  S DOTALL "." matches any character at all, including the newline（换行符）.  X VERBOSE Ignore whitespace and comments for nicer looking RE's.  U UNICODE Make \w, \W, \b, \B, dependent on the Unicode locale.  This module also defines an exception 'error'.  compile(pattern, flags=0) 返回一个模式对像 Compile a regular expression pattern, returning a pattern object.  escape(pattern) Escape all non-alphanumeric characters in pattern.  findall(pattern, string) 如果出现一个或多个匹配，返回所有组的列表；这个列表将是元组的列表。 空匹配也在返回值中 Return a list of all non-overlapping（不相重叠的） matches in the string. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.  finditer(pattern, string) 返回一个指示器（iterator）；每匹配一次，指示器返回一个匹配对像。 空匹配也在返回值中 Return an iterator over all non-overlapping matches in the string. For each match, the iterator returns a match object. Empty matches are included in the result.  match(pattern, string, flags=0) 返回一个匹配的对像，如果没有匹配的，返回一个None Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.  purge() Clear the regular expression cache  search(pattern, string, flags=0) 返回一个匹配的对像，如果没有匹配的，返回一个None Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.  split(pattern, string, maxsplit=0) 返回一个包含结果字串的列表 Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.  sub(pattern, repl, string, count=0) 返回一个字串，最左边被不重叠的用"repl"替换了。 Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl  subn(pattern, repl, string, count=0) 返回一个包含(new_string, number)的2元组；number是替换的次数 Return a 2-tuple containing (new_string, number). new_string is the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in the source string by the replacement repl. number is the number of substitutions that were made.  template(pattern, flags=0) 返回一个模式对像 Compile a template pattern, returning a pattern object  _______________________________________________ python-chinese list [email protected] http://python.cn/mailman/listinfo/python-chinese

{PyRe}

1.2. 自动检查md5sums

From: steve <[email protected]>

Toggle line numbers Toggle line numbers

 1 #! /usr/local/bin/python  2   3 import commands  4 file = raw_input("Enter the filename: ")  5 sum = raw_input("Enter the md5sum: ")  6 md = "md5sum " + file  7 print md  8 check = str(commands.getoutput(md))  9 checksum = sum + " " + file  10 #print checksum  11 print check  12 if check == checksum: print "Sums OK"  13 else: print "Sums are not the same!"

1.3. 提取网页中的超链接

Toggle line numbers Toggle line numbers

 1 r='<a(?:(?:\\s*.*?\\s)|(?:\\s+))href=(?P<url>\S*?)(?:(?:\\s.*>)|(?:>)).*?</a>'  2 compile(r).findall(a)

这个是hoxide和天成讨论出来的方法,用来提取网页中的超链接.

1.4. 解决在 Python 中登录网站的问题

刚刚看了xyb的代码，有点启发。 写了一小段试了以下，可以登录了。呵呵。 import httplib import urllib user=? pwd=? params=urllib.urlencode({"Loginname":user,"Loginpass":pwd,"firstlogin":1,"option":"登入论坛"}) headers={"Accept":"text/html","User-Agent":"IE","Content-Type":"application/x-www-form-urlencoded"} website="www.linuxforum.net" path="/forum/start_page.php" conn=httplib.HTTPConnection(website) conn.request("POST",path,params,headers) r=conn.getresponse() print r.status,r.reason data=r.read() print data conn.close()  不知从form submit数据和直接提交request有些什么区别？

中国Linux论坛
由xyb总结:PythonClientCookie

1.5. 浮点数的输出格式

>>> a=6200-6199.997841 >>> a 0.0021589999996649567 >>> print "%f"%a 0.002159 >>> import fpformat >>> fpformat.fix(a, 6) '0.002159' >>> print fpformat.fix(a, 6) 0.002159 >>> print "%.6f"%a 0.002159 >>> print "%.7f"%a 0.0021590 >>> print "%.10f"%a 0.0021590000 >>> print "%.5f"%a 0.00216

1.6. 怎么下载网络上的一张图片到本地

>知道了一张图片的URL >比如http://www.yahoo.com/images/logo.gif >想将它下载到本地保存应该怎么实现?

Toggle line numbers Toggle line numbers

 1 urllib.urlretrieve(url, filename)

---Limodou

1.7. 使用locale判断本地语言及编码

from::limodou的学习记录

在支持unicode软件中，经常需要在不同的编码与unicode之间进行转换。

那么对于一个本地文件的处理，首先要将其内容读出来转换成unicode编码，在软件中处理完毕后，再保存为原来的编码。

如果我们不知道文件的确切编码方式，可以使用默认的编码方式。那么我们可以使用locale模块来判断默认编码方式。

>>>import locale >>>print locale.getdefaultlocale() ('zh_CN', 'cp936')

可以看出，我的机器上默认语言是简体中文，编码是GBK。

1.8. new的使用

from: 中国Linux论坛 -rings

new

new是python里object的方法。如果你要重载new，那么你需要继承object。 new是类方法。他不带self参数。 new和init是不一样的。init带 self参数。所以他是在对象已经被构造好了以后被调用的。而如果你要在对象构造的时候做一些事情，那么就需要使用new。new的返回值必须是对象的实例。 new一般在一些模式里非常有用。我们看一个例子。这个例子是《thinking in python》里的一个Singleton例子

class OnlyOne(object):  class __OnlyOne:  def __init__(self):  self.val = None  def __str__(self):  return ′self′ + self.val   instance = None  def __new__(cls): # __new__ always a classmethod  if not OnlyOne.instance:  OnlyOne.instance = OnlyOne.__OnlyOne()  return OnlyOne.instance  def __getattr__(self, name):  return getattr(self.instance, name)  def __setattr__(self, name):  return setattr(self.instance, name)  x = OnlyOne() x.val = 'sausage' print x y = OnlyOne() y.val = 'eggs' print y z = OnlyOne() z.val = 'spam' print z print x print y

我们可以看到OnlyOne从object继承而来。

如果你不继承object，那么你的 new就不会在构造的时候来调用。

当x = OnlyOne()的时候，其实就是调用new(OnlyOne), 每次实例化OnlyOne 的时候都会调用。

因为他是类方法。

所以这段代码就是利用这个特性来实现Singleton的。

因为不管构造多少对象，都要调用new.

那么在OnlyOne里保持一个类的属性， instance.

他代表嵌套的_OnlyOne的实例。

所以，对于他，我们只构造一次。

以后每次构造的时候都是直接返回这个实例的。

所以，在这里， x,y,z 都是同一个实例。

这个方法和典型的用C++ 来实现 Singleton的道理是一样的。

1.9. traceback 的处理

from::Limodou的学习记录

trackback在 Python 中非常有用，它可以显示出现异常(Exception)时代码执行栈的情况。但当我们捕捉异常，一般是自已的出错处理，因此代码执行栈的信息就看不到了，如果还想显示的话，就要用到traceback模块了。

这里只是简单的对traceback模块的介绍，不是一个完整的说明，而且只是满足我个人的要求，更详细的还是要看文档。

打印完整的traceback

让我们先看一个traceback的显示：

>>> 1/0  Traceback (most recent call last):  File "", line 1, in -toplevel-  1/0 ZeroDivisionError: integer division or modulo by zero

可以看出 Python 缺省显示的traceback有一个头：第一行，出错详细位置：第二、三行，异常信息：第四行。也就是说分为三部分，而在traceback可以分别对这三部分进行处理。不过我更关心完整的显示。

在traceback中提供了print_exc([limit[, file]])函数可以打印出与上面一样的效果。 limit参数是限定代码执行栈的条数，file参数可以将traceback信息输出到文件对象中。缺省的话是输出到错误输出中。举例：

>>> try:  1/0 except:  traceback.print_exc()  Traceback (most recent call last):  File "", line 2, in ? ZeroDivisionError: integer division or modulo by zero

当出现异常sys.exc_info()函数会返回与异常相关的信息。如：

>>> try:  1/0 except:  sys.exc_info()  (<class exceptions.ZeroDivisionError at 0x00BF4CC0>, <exceptions.ZeroDivisionError instance at 0x00E29DC8>, <traceback object at 0x00E29DF0>)

sys.exc_info()返回一个tuple，异常类，异常实例，和traceback。

print_exc()是直接输出了，如果我们想得到它的内容，如何做？使用 format_exception(type, value, tb [,limit])，type, value, tb分别对应 sys.exc_info()对应的三个值。如：

>>> try:  1/0 except:  type, value, tb = sys.exc_info()  print traceback.format_exception(type, value, tb)  ['Traceback (most recent call last):\n', ' File "", line 2, in ?\n', 'ZeroDivisionError: integer division or modulo by zero\n']

这样，我们知道了format_exception返回一个字符串列表，这样我们就可以将其应用到我们的程序中了。

1.10. os.walk()的用法, 修改cvsroot

重装系统, windows盘符大乱, 原来是'e:\cvsroot'现在变为'g:\cvsroot', 众多由cvs管理的目录无法正常工作了. python脚本出动:

Toggle line numbers Toggle line numbers

 1 import os  2 from os.path import join, getsize  3 import sys  4   5 print sys.argv[1]  6 for root, dirs, files in os.walk(sys.argv[1]):  7  if 'CVS' in dirs:  8  fn = join(root+'\CVS', 'ROOT')  9  print root+' :', fn  10  #dirs.remove('CVS') # don't visit CVS directories  11  f = open(fn,'r')  12  r = f.read()  13  print r  14  f.close()  15  if r.startswith('e:\cvsroot'):  16  open(fn, 'w').write('g:\cvsroot')  17  f = open(fn,'r')  18  r = f.read()  19  print r  20  f.close()

2. Python多进程处理之参考大全

* PyCourse --from

http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

3. 将你的Python脚本转换为Windows exe程序

from:: http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

将Python的脚本变为一个可以执行的Windows exe程序可能的好处会说出很多，我最喜欢的则是它会让你写的程序更像是一个“程序”罢。但是，凡事有利就有弊，这样必然会让python的一些好处没有了。

你可以从这里找到py2exe的相关信息，可以在这里下载到py2exe-0.4.2.win32-py2.3.exe安装包。但是它的使用也还是比较麻烦的，需要你自己手工的写一个小的脚本，就像这样：

Toggle line numbers Toggle line numbers

 1 # setup.py  2 from distutils.core import setup  3 import py2exe  4   5 setup(name="myscript",  6 scripts=["myscript.py"],  7 )

再通过python的执行：

python setup.py py2exe

来使用。更多的信息上它的网站看罢。

4. 使用 WinAPI 的例子

/PyWinApi -- 简单范例

5. 在函数中确定其调用者！

AlbertLee
/PyCallParent

6. Python哲学--内省的威力

AlbertLee
Xie Yanbo 引发
Remember, Python comes with batteries included!
PyBatteriesIncluded -- 使用内省的功能，获得丰富的信息

7. 在正则表达式中嵌入注释时的陷阱

如下代码所示：

s = 'create table testtable' >>> p = r""" ^create\ table # create table \s* # whitespace ([a-zA-Z]*) # table name $ # end """ >>> re.compile(p, re.VERBOSE).match(s).groups() ('testtable',) >>>

如果在create和table之间没有那个转义的空格，即\ ,在re.VERBOSE 的时候，就会将那个空格忽略掉，因此变成是匹配createtable了，这样他就会匹配不到了

8. python写的数字转中文的程序

源于qq上Jaina(16009966)的提问. 花了一个晚上实现了一下, 基本想法是4位为一个断, 用conv4转换, 然后再用conv组合之. 程序在Windows2003, python2.4下调试通过. 注意编码问题.

Toggle line numbers Toggle line numbers

 1 # coding:utf-8  2   3 UUNIT=[u'', u'十' , u'百' , u'千']  4 BUINT = [u'', u'万', u'亿', u'万亿' , u'亿亿']  5 NUM=[u'零',u'一',u'二', u'三', u'四', u'五' , u'六', u'七', u'八', u'九']  6   7 def conv4(num, flag=False):  8  ret = u''  9  s = str(num)  10  l = len(s)  11  assert(len(s) <= 4)  12  if flag and len(s)<4:  13  ret = ret + NUM[0]  14  for i in xrange(l):  15  if s[i] != '0':  16  ret = ret + NUM[int(s[i])]+UUNIT[l-i-1]  17  elif s[i-1] != '0':  18  ret = ret + NUM[0]  19  return ret  20   21 def conv(num):  22  ss = str(num)  23  l = len(ss)  24  j = l / 4  25  jj = l % 4  26  lss = [ss[0:jj] for i in [1] if ss[0:jj]] \ + [ss[i*4+jj:(i+1)*4+jj] for i in xrange(j) if ss[i*4+jj:(i+1)*4+jj] ]  27  print lss  28  ul = len(lss)  29  ret = u''  30  zflag = False  31  for i in xrange(ul):  32  bu = BUINT[ul-i-1]  33  tret = conv4(int(lss[i]), flag = i)  34  if tret[-1:] == NUM[0]:  35  tret = tret[:-1]  36  if tret:  37  print zflag , (tret+bu).encode('mbcs')  38  if zflag and tret[0] != NUM[0] :  39  ret = ret + NUM[0] +tret+bu  40  else:  41  ret = ret + tret+bu  42  zflag = False  43  else:  44  zflag = True  45  return ret  46   47 if __name__ == '__main__':  48  #print conv(11111)  49  print conv(103056).encode('mbcs')  50  print conv(101000).encode('mbcs')  51  print conv(1200999100000000010).encode('mbcs')