python 迭代器
时间:2010-09-06 来源:菜刀大侠
学习材料:
learning python 4th
例子代码:
1 ### file: script1.py
2
3 # A first Python script
4 import sys # Load a library module
5 print(sys.platform)
6 print(2 ** 100) # Raise 2 to a power
7 x = 'Spam!'
8 print(x * 8) # String repetition
开始测试:
结果1:1 >>> f = open('script1.py')
2 >>> f.readline()
3 '### file: script1.py\n'
4 >>> f.readline()
5 '\n'
6 >>> f.readline()
7 '# A first Python script\n'
8 >>> f.readline()
9 'import sys # Load a library module\n'
10 >>> f.readline()
11 'print(sys.platform)\n'
12 >>> f.readline()
13 'print(2 ** 100) # Raise 2 to a power\n'
14 >>> f.readline()
15 "x = 'Spam!'\n"
16 >>> f.readline()
17 'print(x * 8) # String repetition\n'
18 >>> f.readline()
19 ''
20 >>> f.readline()
21 ''
22 >>> f.readline()
23 ''
24 >>> f.readline()
25 ''
26 >>>
可以看到这样的重复并不是很好。
下一个例子:
采用__next__
结果2:
1 >>> f = open('script1.py')
2 >>> f.readline()
3 '### file: script1.py\n'
4 >>> f.__next__()
5 '\n'
6 >>> f.__next__()
7 '# A first Python script\n'
8 >>> f.__next__()
9 'import sys # Load a library module\n'
10 >>> f.__next__()
11 'print(sys.platform)\n'
12 >>> f.__next__()
13 'print(2 ** 100) # Raise 2 to a power\n'
14 >>> f.__next__()
15 "x = 'Spam!'\n"
16 >>> f.__next__()
17 'print(x * 8) # String repetition\n'
18 >>> f.__next__()
19 Traceback (most recent call last):
20 File "<pyshell#25>", line 1, in <module>
21 f.__next__()
22 StopIteration
由于采用了__next__方法,最后一行会出现StopIteration的错误,说明读完了。
与上面重复f.readline()相比,你看到了不会产生错误,而是输出空字符。
下面第一个结果的看改进版本:
>>> for line in open('script1.py'):
print(line.upper(), end='')
### FILE: SCRIPT1.PY
# A FIRST PYTHON SCRIPT
IMPORT SYS # LOAD A LIBRARY MODULE
PRINT(SYS.PLATFORM)
PRINT(2 ** 100) # RAISE 2 TO A POWER
X = 'SPAM!'
PRINT(X * 8) # STRING REPETITION
the best way to read a text file line by line today is to not read it at all—instead, allow the for loop to automatically call __next__ to advance to the next line on each iteration. The file object’s iterator will do the work of automatically loading lines as you go. The following, for example, reads a file line by line, printing the uppercase version of each line along the way, without ever explicitly reading from the file at all:
>>> for line in open('script1.py'): # Use file iterators to read by lines
... print(line.upper(), end='') # Calls __next__, catches StopIteration
...
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)
大意:
最好的读取文本文档的方法不是一下子都读完,相反,一行一行读比较省内存。
Notice that the print uses end='' here to suppress adding a \n, because line strings already have one (without this, our output would be double-spaced). This is considered the best way to read text files line by line today, for three reasons: it’s the simplest to code, might be the quickest to run, and is the best in terms of memory usage. The older, original way to achieve the same effect with a for loop is to call the file readlines method to load the file’s content into memory as a list of line strings:
注意,print使用了 end=''来抑制输出\n,因为line string已经包含了。这是读取文本文档的最好方法,三个理由:
1,编码简单
2,可能运行速度快
3,内存使用率高
接下去提供一种比较早的原始方法:
>>> for line in open('script1.py').readlines():
... print(line.upper(), end='')
...
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)
自己测试下:
>>> for line in open('script1.py').readlines():
print(line.upper(), end='')
### FILE: SCRIPT1.PY
# A FIRST PYTHON SCRIPT
IMPORT SYS # LOAD A LIBRARY MODULE
PRINT(SYS.PLATFORM)
PRINT(2 ** 100) # RAISE 2 TO A POWER
X = 'SPAM!'
PRINT(X * 8) # STRING REPETITION
解释为什么这个不好:
This readlines technique still works, but it is not considered the best practice today and performs poorly in terms of memory usage. In fact, because this version really does load the entire file into memory all at once, it will not even work for files too big to fit into the memory space available on your computer. By contrast, because it reads one line at a time, the iterator-based version is immune to such memory-explosion issues
while版本:
while版本
>>> f = open('script1.py')
>>> while True:
line = f.readline()
if not line: break
print(line.upper(),end='')
### FILE: SCRIPT1.PY
# A FIRST PYTHON SCRIPT
IMPORT SYS # LOAD A LIBRARY MODULE
PRINT(SYS.PLATFORM)
PRINT(2 ** 100) # RAISE 2 TO A POWER
X = 'SPAM!'
PRINT(X * 8) # STRING REPETITION
However, this may run slower than the iterator-based for loop version, because iterators run at C language speed inside Python, whereas the while loop version runs Python byte code through the Python virtual machine. Any time we trade Python code for C code, speed tends to increase. This is not an absolute truth, though, especially in Python 3.0; we’ll see timing techniques later in this book for measuring the relative speed of alternatives like these.
To support manual iteration code (with less typing), Python 3.0 also provides a builtin function, next, that automatically calls an object’s __next__ method. Given an iterable object X, the call next(X) is the same as X.__next__(), but noticeably simpler. With files, for instance, either form may be used:
>>> f = open('script1.py')
>>> f.__next__() # Call iteration method directly
'import sys\n'
>>> f.__next__()
'print(sys.path)\n'
>>> f = open('script1.py')
>>> next(f) # next built-in calls __next__
'import sys\n'
>>> next(f)
'print(sys.path)\n'