A Developer's Guide to Python 3.0: Numbers, String
时间:2009-04-11 来源:cobrawgl
PEP 237: Unifying Long Integers and Integers
Python 2.x has two integral types: int and long. The int type is limited to the machine's "native" word size (32 or 64 bit in modern machines). Operations on the int type can overflow and result in OverflowError exceptions (before Python 2.2). In contrast, the long type is limited only by the amount of available memory, and could conceptually represent any integer.The reason for having two integer types is that int is very efficient because it has direct support in hardware and OSs, while the long type is flexible and doesn't require the developer to keep tabs on the size of numbers. But having two types presents several problems when porting compiled Python files or pickled objects across machines with different architectures.
The goal of PEP-237 is to eventually unify these two concepts, combining them into a single integer type that changes its representation internally to use the more efficient machine integer when possible. The implementation actually stretched across four different versions: 2.2, 2.3, 2.4, and is now complete in 3.0.
Python 2.4 and higher already support auto-promotion of int to long without exceptions or warnings. Python 3.0 simply eliminated the long type and long literals at the Python level. If you try to use long in Python 3.0 you will get an error:
>>> long Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'long' is not definedPython 3.0 also removed the L suffix for longs. Now, an integer is an integer is an integer. In Python 2.5 this is fine:
>>> 5L 5LBut in Python 3.0 it's a syntax error:
>>> 5L File "<stdin>", line 1 5L ^ SyntaxError: invalid syntaxIn Python 2.5 the following code generates a long object:
>>> x = 5 ** 88 >>> type(x) <type 'long'> >>> x 32311742677852643549664402033982923967414535582065582275390625LIn Python 3.0 it's an int:
>>> x = 5 ** 88 >>> type(x) <class 'int'> >>> x 32311742677852643549664402033982923967414535582065582275390625
PEP 3127: Integer Literal Support and Syntax
Python has always supported a plethora of radices or bases for integers. The int() and long() functions in Python 2.5 accept a second argument, which is the base to convert from. The base can be any integer between 2 and 36 (inclusive):>>> int('000111', 2) 7 >>> int('000111', 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: int() base must be >= 2 and <= 36 >>> int('000111', 36) 1333 TypeError: long() can't convert non-string with explicit base >>> long('555555555555555555555555555555555555555', 6) 2227915756473955677973140996095LPython 3.0 preserves this functionality (although the error message says arg 2 instead of base):
>>> int('0001111', 2) 15 >>> int('5', 36) 5 >>> int('5', 37) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: int() arg 2 must be >= 2 and <= 36Python 2.5 also supported integer literals in octal and hexadecimal, so whenever an integer was expected you could provide an octal or hexadecimal number instead. Octal numbers required a leading zero, as in 0123; hexadecimal numbers required both a leading zero and the character x or X, as in 0x123. Finally, there are two functions called oct() and hex(), each of which takes an integer and returns its string representation in octal or hexadecimal, for example:
>>> 010 8 >>> 010 + 8 16 >>> 0xa 10 >>> 0xa + 010 + 2 20 >>> oct(20) '024' >>> hex(20) '0x14'Python 3.0 maintained all this functionality, but with one small change—the prefix for octal numbers is now a zero and the character o or O as in 0O123 instead of just 0123. The original notation with the single leading zero was borrowed from C programming language. The change reduces the possibility for confusion for developers unfamiliar with C-like languages or with octal numbers. The expectation of such developers is that leading zeros don't change the value of numbers. For example, they might try to use leading zeros for formatting and indentation purposes and unwittingly end up with the wrong numbers. In addition, Python 3.0 adds a binary literal. All in all, this break from the C legacy creates a uniform notation for integer literals in bases 2, 8, and 16 (binary, octal, and hexadecimal). The prefixes are 0b, 0o and 0x:
>>> 0b10 2 >>> 0o10 8 >>> 0x10 16There is also a new bin() function that converts integers to a binary string representation (analogous to oct() and hex()):
>>> bin(5) '0b101' >>> bin(0x10) '0b10000' >>> bin(0o10) '0b1000'The oct() function of course uses the new 0o prefix and not the old 0 prefix as in Python 2.5:
>>> oct(12) '0o14'I feel that this change, while pretty minor in the great scheme of things, is an elegant and clean win-win solution. It removed an obstacle from the path of new users, it made a clean break from the past (octal notation in C), and it unified the notation for radix literals, which is important when adding the new binary literal.
PEP 238: Changing the Division Operator
In Python 2.x the division operator is divided (no pun intended) between integer division and float/complex division. Integer division is actually floor division, where the result is always rounded down to the nearest integer (int or long):>>> 5 / 2 2 >>> -5 / 2 -3Float/complex division is true division that returns a reasonable approximation of the mathematical result:
>>> 5.0 / 2 2.5 >>> -5 / 2.0 -2.5 >>> complex(5, 0) / 2 (2.5+0j)This is arguably the most serious problem in the design of the language. In the context of writing numeric algorithms that operate on integers and floats/complex numbers it makes life really hard. For a function library writer who wants to ensure true division (I'll get to the __future__ workaround later) it is not immediately clear how to do it in a safe manner.
Suppose you want to write an average function that operates on integers, floats, and complex numbers. Here's a naive implementation:
def average(*numbers): return sum(numbers) / len(numbers)This function will fail if all the numbers are integers and the average is not an integer:
>>> average(1,4) 2So, how do you make sure you get true division? You can try something like casting all the arguments to floats—but then it wouldn't work on complex numbers. You can try adding 0.0 to each number, but if the original number was the float -0.0 you might lose the sign (yes, there is a difference between 0.0 and -0.0 according to the IEEE 754 "floating point" standard):
>>> x=-0.0 >>> x -0.0 >>> x + 0.0 0.0It turns out that the only safe way to preserve type and sign and enforce true division is by multiplying all the arguments by 1.0—a solution that might easily escape someone writing a simple average function, and one that has a non-negligible overhead:
def average(*numbers): return sum([n * 1.0 for n in numbers]) / len(numbers) >>> average(1,4) 2.5Fortunately, in Python 3.0, division is always a true division that returns a float (even if the result is integer) or complex (if one of the operands is complex):
>>> 5 / 2 2.5 >>> -5 / 2 -2.5 >>> 4 / 2 2.0 >>> complex(5, 0) / 2 (2.5+0j)To get floor division in Python 3.0, use the // operator. If both operands are integers, the result will be an integer. If one of the operands is a float and the other is a float or integer the result is float (but a float that equals an integer). Floor division doesn't work on complex numbers:
>>> 5 // 2 2 >>> 5.0 // 2 2.0 >>> complex(5,0) // 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't take floor of a complex number.It's not well known, but explicit floor division has been available in Python since Python 2.2. Moreover, this true division behavior is also available in Python 2.2 if you use the following statement:
from __future__ import divisionThis can be useful if you want to incrementally prepare your code for Python 3.0 migration. If you have numeric code, or if you call into code that uses the division operation in its Python 2.x form, you might run into nasty bugs. That's because you will have to check every call site; you can't just change the code that uses the division operator itself. You'll see more specific information about how to attack the division operator issue about migrating to Python 3.0 in a future article.
You can also control the division behavior by passing the -Q command-line argument to the interpreter with valid values of: old (default), warn, warnall, and new (true division); however, I don't recommend using it unless you know exactly what you are doing and why. It is pretty brittle to rely on command-line arguments to control a concept as central as division behavior.
Floating Point Improvements
The rejected PEP-754 attempted to make Python fully IEEE 754 compliant. It was rejected because Python (the CPython implementation) relies on the underlying C library to handle IEEE 754 special values such as NaN (not a number), and positive and negative infinity. There are many inconsistencies across different platforms. Still, Python 3.0 (and 2.6 too), incorporated many floating point improvements, and both implement the IEEE 754 standard much more closely.
The float() function that turns strings into floating point numbers now understands nan, +inf (or inf), and -inf and turns them into the Not A Number, Positive and Negative Infinity IEEE 754 values. (Case doesn't matter, so NaN, INF, etc., are valid too.)
The math module now has the functions isnan() and isinf(). The isinf() function doesn't distinguish between inf, +inf and -inf. Here are some examples:
>>> float('nan') nan >>> float('NaN') # Any case works nan >>> float('+inf') inf >>> float('-inf') -inf >>> float('INF') inf >>> float('nan') + float('inf') nan >>> float('inf') + float('-inf') nan >>> float('inf') - float('-inf') inf >>> import math >>> math.isnan(float('nan')) True >>> math.isinf(float('inf')) True >>> math.isinf(float('-inf')) True >>> math.isinf(float('nan')) False >>> math.isnan(float('-inf')) False
The math module has now also a copysign(x, y) function that returns the absolute value of x with the sign of y. I don't understand why this function exists instead of a simple sign() function that returns -1, 1 or a couple of ispositive(), isnegative() functions. The documentation is very succinct:
>>> help(math.copysign) Help on built-in function copysign in module math: copysign(...) copysign(x,y) Return x with the sign of y.
However, copysign works as advertised except for NaN. If you try to copy the sign of NaN you get inconsistent results—a negative sign on Mac OS X and a positive sign on Windows. A closed bug says this behavior is OK. I disagree. NaN is not a number, and as such has no sign. Trying to copy the sign of NaN is like trying to copy the sign of any other non-number value (string, list, object) and should result in an exception.
Some other functions related to floating point numbers were added to the math module too. math.fsum() adds up the stream of numbers from an iterable, and is careful to avoid loss of precision by using partial sums (unlike the built-in sum() function). If any of the numbers are NaN, the result is NaN. If the partial sum reaches +inf or -inf, the sum() function returns that as the result. The math.fsum() function raises an OverflowError exception, which is more in the spirit of IEEE 754:
>>> sum([1e308, 1, -1e308]) 0.0 >>> math.fsum([1e308, 1, -1e308]) 1.0 >>> sum([1e100, 1, -1e100, -1]) -1.0 >>> math.fsum([1e100, 1, -1e100, -1]) 0.0 >>> x = [1e308, 1e308, -1e308] >>> sum(x) inf >>> math.fsum(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: intermediate overflow in fsum >>> sum([float('nan'), 3.3]) nan >>> math.fsum([float('nan'), -float('nan')]) nan
The functions acosh(), asinh(), and atanh() compute inverse hyperbolic functions. The log1p() function returns the natural logarithm of 1+x (base e). The trunc() function rounds a number toward zero, returning the closest integer value:
>>> math.acosh(30) 4.0940666686320855 >>> math.acosh(1) 0.0 >>> math.asinh(1) 0.88137358701954305 >>> math.asinh(0) 0.0 >>> math.atanh(0.5) 0.54930614433405489 >>> math.log1p(2) 1.0986122886681098 >>> math.trunc(-1.1) -1 >>> math.trunc(-1.9) -1 >>> math.trunc(1.1) 1 >>> math.trunc(1.9) 1 >>> math.trunc(3.0) 3
You can convert floating-point numbers to or from hexadecimal strings. The conversion functions convert floats to and from a string representation without introducing rounding errors from the conversion between decimal and binary (if there are enough digits to represent the number fully). Floats have a hex() method that returns a string representation, while the float.fromhex() method converts a string back into a number (as accurately as possible):
>>> x = 4.2 >>> a.hex() '0x1.0cccccccccccdp+2' >>> float.fromhex('0x1.0cccccccccccdp+2') 4.2000000000000002
The decimal module was updated to version 1.66 of the General Decimal Specification. New features include some methods for some basic mathematical functions such as exp() and log10():
>>> Decimal(1).exp() Decimal("2.718281828459045235360287471") >>> Decimal("2.7182818").ln() Decimal("0.9999999895305022877376682436") >>> Decimal(1000).log10() Decimal("3")
The as_tuple() method of Decimal objects now returns a named tuple (more on named tuples in future article) with sign, digits, and exponent fields:
>>> Decimal('-3.3').as_tuple() DecimalTuple(sign=1, digits=(3, 3), exponent=-1)
A new variable in the sys module, float_info, is an object that contains information derived from the float.h file about the platform's floating-point support:
>>> sys.float_info sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.2204460492503131e-16, radix=2, rounds=1)
Overall, Python has definitely elevated its level of support for floating point numbers—but it is not perfect yet. The Numpy external package is still the best tool for serious number
PEP 3141: A Type Hierarchy for Numbers
Python 3.0 introduces a module called numbers (available in Python 2.6, too). This module contains abstract base classes (ABCs) for different number types. In effect, it creates a hierarchy of number types, where each type is a subtype of a more general type. This hierarchy was inspired by Scheme's numeric tower. There are five different ABCs: Number, Complex, Real, Rational, and Integral. Each one (except Number) is a sub-class of its predecessor. For example, an Integer is a subtype of Rational because you can think of every integer X as a rational number X/1, that is a rational number where X is the numerator and 1 is the denominator. The Number base class doesn't correspond to any actual number type. It's there just in case you want to verify that a value is a number. Usually, in this case it is sufficient to check whether the argument is a subclass of numbers.Complex, but in rare cases you may want to introduce a new kind of number between Number and Complex by registering, and not by sub-classing Complex.The Complex, Real, and Integer ABCs are implemented by the complex, float, and int types, respectively:
>>> issubclass(int, numbers.Integral) True >>> issubclass(float, numbers.Real) True >>> issubclass(complex, numbers.Complex) True >>> issubclass(complex, numbers.Real) FalseThe Rational ABC is implemented by the fractions.Fraction type from the new fractions module. This module has also a gcd method for finding the greatest common denominator and a couple of conversion functions: from_float() and from_decimal(). It would have been cleaner and more consistent to add a rational numeric type to the language, but that ended up as a class in a module:
>>> issubclass(fractions.Fraction, numbers.Rational) True >>> fractions.gcd(Fraction(1,3), Fraction(1,2)) Fraction(1, 6) >>> fractions.gcd(Fraction(2,6), Fraction(2,3)) Fraction(1, 3) >>> fractions.gcd(6,9) 3 >>> Fraction.from_float(0.5) Fraction(1, 2) >>> import math >>> Fraction.from_float(math.pi) Fraction(884279719003555, 281474976710656) >>> Fraction.from_decimal(decimal.Decimal('3.5')) Fraction(7, 2)The Decimal class from the decimal module for fixed point and floating point arithmetic (introduced in Python 2.4) doesn't participate in the party, and doesn't implement any of the number ABCs. The PEP-3141 mentions that, after consulting the authors of the decimal module, it was decided it was better not to integrate it at this time.
This semi-formalism of number types allows functions that expect certain types of numbers as arguments, so you can check and verify the arguments more easily.
For example, suppose you want to write a square function that takes a number and multiplies it by itself—but you want to stay in the realm of real numbers, and you always expect the result to be positive. Here's a trivial implementation that doesn't check types:
def square(x): return x * xThe preceding code will fail to return a positive result if you call it with a complex number:
>>> square(complex(0,1)) (-1+0j)You could try an explicit check to ensure that the argument is an int or float, but that's somewhat clunky—especially if you want to support rational numbers too:
>>> def square(x): ... assert type(x) == int or \ ... type(x) == float or \ ... type(x) == Fraction ... return x * x ... >>> from fractions import Fraction >>> square(5) 25 >>> square(4.4) 19.360000000000003 >>> square(Fraction(1,3)) Fraction(1, 9) >>> square(complex(0,1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in square AssertionErrorYou may also come up with your own numeric types by either subclassing one of the existing types, or one of the ABCs from the numbers module. Then you'll be able to pass your new numbers to code that expects those specific types, and checks its arguments based on an ABC. This can be useful, for example, when you work with integers in a limited domain, or with real numbers that have fixed point semantics but limited precision. I won't give a full-fledged example here, because you need to implement a lot of methods to comply with any of the ABCs.
All Strings Are Now Unicode
In Python 2.x there are two types of strings: byte strings (str) and Unicode strings(unicode). Byte strings contain bytes (usually interpreted by Python based on your default locale). Unicode strings, of course, contain Unicode characters:>>> s = 'hello' >>> u = u'\u05e9\u05dc\u05d5\u05dd' >>> type(s) <type 'str'> >>> type(u) <type 'unicode'> Both str and unicode were derived from a common base class called "basestring:"In Python 3.0, all strings are Unicode. The str type has the same semantics as unicode in Python 2.x, and there is no separate unicode type. The basestring base class is gone as well:
>>> unicode.__bases__ (<type 'basestring'>,) >>> unicode.__base__ <type 'basestring'> >>> str.__bases__ (<type 'basestring'>,)
>>> s = '\u05e9\u05dc\u05d5\u05dd' >>> type(s) <class 'str'>Instead of Python 2.x's byte string there are now two types: bytes and bytearray. There are both immutable and mutable versions of a byte array. The bytes type supports a large number of string-like methods, as shown below:
>>> dir(bytes) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'find', 'fromhex', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']The bytearray type also has the following mutating methods: extend(), insert(), append(), reverse(), pop(), and remove().
They also support the + and * operators (using the same semantics as strings) and the bytearray type also supports += and *=.
You can't convert to or from str without explicit encoding, because neither bytes nor bytearray know about encoding, and str objects must have an encoding. If you try to pass a bytes or bytearray object directly to str() you will get a result of repr(). To convert you must use the decode() method:
>>> a = bytearray(range(48, 58)) >>> a bytearray(b'0123456789') >>> s = str(a) >>> s "bytearray(b'0123456789')" >>> s = a.decode() >>> s '0123456789'To convert from a string to bytes or bytearray you must use the string's encode() method or provide an encoding to the constructor of the bytes or bytearray object:
>>> s = '1234' >>> s '1234' >>> b = bytes(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: string argument without an encoding >>> b = s.encode() >>> b b'1234' >>> b = bytes(s, 'ascii') >>> b b'1234'The string representation has changed too. In Python 2.x the return type of repr() was str, which was an ASCII-based string. In Python 3.0 the return type is still str, but it's now a Unicode string. The default encoding of the string representation is determined by the output device.
PEP 3101: Advanced String Formatting
Python 3.0 brings a powerful new way to format strings that's based on Microsoft's .NET composite formatting (an excellent choice). I have used the string formatting facilities of many programming languages, but the C# formatting (which uses .NET composite formatting) experience was the best by far. It was powerful, flexible, consistent, and well documented. In Python 2.x you can format strings using the % operator or using string.Template. The % operator is convenient; when you want to format only a single argument, you can pass it as is:import time >>> time.localtime() (2008, 12, 31, 10, 32, 16, 2, 366, 0) >>> 'The current year is %d' % time.localtime()[0] 'The current year is 2008'To format multiple arguments, you must pack them in a tuple or list:
>>> t = time.localtime() >>> 'Day: %d, Month: %d, Year: %d' % (t[2], t[1], t[0]) 'Day: 31, Month: 12, Year: 2008'With the tuple/list approach you must specify the arguments in the exact order they will be formatted. Also, if you want the same value to appear multiple times you must format it multiple times:
>>> s = 'The solution to the square of %d is: %d * %d = %d' >>> s % (5, 5, 5, 5 * 5) 'The solution to the square of 5 is: 5 * 5 = 25'Alternatively, you can pass a dictionary and specify the dictionary keys in the format string:
>>> d = dict(n=5, result=5 * 5) >>> s = 'The solution to the square of %(n)d is: %(n)d * %(n)d = %(result)d' >>> s % d 'The solution to the square of 5 is: 5 * 5 = 25'As you can see, the dictionary approach lets you specify repeating values just once, but at a high price; it's both more complicated than format string, and requires preparation of the dict rather than simply passing values.
Finally there is also the string.Template class. You use this to prepare compiled templates that you can apply multiple times to different values efficiently, because the format string itself must be parsed only once. This especially important for use cases such as templated web pages or code generation scenarios, where the test results can be large, and parsing the format string can be expensive. The format string is a little different. Named values are preceded by a $ sign and optionally enclosed in curly braces to distinguish them from the surrounding text:
>>> s = 'The solution to the square of ${n} is: ${n} * ${n} = ${result}' >>> t = string.Template(s) >>> for i in range(1, 7): ... d = dict(n=i, result=i * i) ... print t.substitute(d) ... The solution to the square of 1 is: 1 * 1 = 1 The solution to the square of 2 is: 2 * 2 = 4 The solution to the square of 3 is: 3 * 3 = 9 The solution to the square of 4 is: 4 * 4 = 16 The solution to the square of 5 is: 5 * 5 = 25 The solution to the square of 6 is: 6 * 6 = 36Python 3.0 added a new formatting method called format to the string class. It is intended to replace the % formatting of short format strings and not the string.Template formatting, because it doesn't compile its format string. The format() method understands both positional and keyword arguments within a single format string. You enclose substitution fields in the format string in curly braces. You can reuse the same positional argument multiple times in different fields:
>>> s = 'Addition is commutative. For example: {0} + {1} = {1} + {0}' >>> s.format(5, 7) 'Addition is commutative. For example: 5 + 7 = 7 + 5' >>> s.format(4, 3, result=3 * 4) '4 multiplied by 3 is 12'You can escape curly braces by doubling them:
>>> '{0} "{{", {1} "}}"'.format('open curly:', 'closed curly:') 'open curly: "{", closed curly: "}"'The format() method supports both simple fields, which are either strings or base-10 integers, and compound fields. Compound fields are quite useful because they allow you to access object attributes or elements of arrays:
>>> import fractions >>> r = fractions.Fraction(5, 4) >>> '{0.numerator} / {0.denominator}'.format(r) '5 / 4' >>> 'Day: {0[2]}, Month: {0[1]}, Year: {0[0]}'.format(time.localtime()) 'Day: 31, Month: 12, Year: 2008'The ability to access attributes and array elements simplifies their use because a developer needs to provide only the object or tuple/list/array, not break it up and arrange the parts in the right order. Compare the preceding example to the Python 2.x version presented earlier.
Unlike some templating languages, you may not use arbitrary Python expressions in the format strings. The Python 3.0 format string is limited to objects, attributes, and indexing into tuples/arrays/lists.
The format() method supports a wide array of format specifiers for fine-tuning the display of formatted fields. You separate format specifiers from the field name with a colon (:) character:
'The "{0:10}" is right padded to 10 characters'.format('field') 'The "field " is right-padded to 10 characters'Objects may define and accept their own format specifiers in the __format__ method (see below), but Python also has a large selection of standard specifiers that apply to every object. The general form of a standard format specifier is:
[[fill]align][sign][#][0] [minimumwidth][.precision][type]There are many fine details and constraints. Some format specifiers make sense only for numeric types, or only if other specifiers exist. There are many display options for integers and real numbers, for example:
>>> '{0:@^8.4}'.format(1 / 3) '@0.3333@'Ok, what happened here? The ampersand (@) is the fill character. The alignment is centered (^). The precision is 4 and the minimum width is 8, so the number was formatted to have four significant digits (0.3333). The zero and the decimal dot took two other characters, so two additional @ characters were added as padding to get a centered display of eight characters. All this is similar to Python 2.x's % formatting, but much more flexible and powerful.
The real power of the new string formatting becomes evident for custom formatting, which you define by implementing the __format__() method. The signature is:
def __format__(self, format_spec): ...Suppose you want to have a ColorString class that can format itself to be displayed in different colors. To print colored text (and much more) to the screen in Python you can use ANSI escape codes on Linux and Mac OS X. On 32-bit Windows you need to use the SetConsoleTextAttribute() API.
Author's Note: The code presented here will not work properly on Windows—it will just print junk characters around the original text instead of changing the colors. |
So to print some red text type:
print('\033[31mRed Text\033[0m')The escape sequence starts with the ESC+[ (also known as the Control Sequence Introducer). The ESC character is non-printable, and can also be written as chr(27) or \x1b (hex notation). Note that the 033 is octal notation for 27. The 31m following the \033[is the incantation used to change the text color to red. The actual text (Red Text) is next, and finally, another incantation restores the colors to their default (\033[0m). Although Python itself has switched its octal notation from 0(number} to 0O{number} the ANSI escape sequences tap into terminal facilities that still use the 0{number} notation.
You can do a lot with the escape sequences, such as change text and background color, move the cursor around the screen (to print in a specific location), erase parts of the screen, hide/show the cursor, and scroll the screen buffer. The examples here focus on changing colors only.
Here's a little module containing a function called colorize() that accepts three arguments: a string, a text color, and a background color. It then wraps the string with the appropriate ANSI escape sequence. First, it prepares a small global dictionary containing all the colors and background colors mapped from a string to their ANSI escape code. The function itself checks whether a color and/or background color were provided by name such as red or green, finds the corresponding codes in the dictionary, and prepares a proper escape sequence to change the colors to the requested colors. Finally, it resets everything to normal. The code shown here has no error checking, so if you request a color name that doesn't exist you will get a KeyError exception:
colors = ['black', 'red', 'green', 'orange', 'blue', 'magenta', 'cyan', 'white'] color_dict = {} for i, c in enumerate(colors): color_dict[c] = (i + 30, i + 40) def colorize(text, color=None, bgcolor=None): c = None bg = None if color is not None: c = color_dict[color][0] if bgcolor is not None: bg = color_dict[bgcolor][1] s = '' if c is not None: s = '\033[%dm' % c if bg is not None: s += '\033[%dm' % bg return '%s%s\033[0m' % (s, text)You can experiment with this to print various colored strings on colored backgrounds. Here's an example that prints white text on a magenta background:
print(colorize('White on Magenta', 'white', 'magenta'))This code and the colorize module work in both Python 2.x and 3.0.
With the colorize() function under your belt you can create the ColorString class that formats itself in color. The basic idea is to subclass the built in str class and add a __format__() method that takes the format_spec and passes it as the text color to the colorize() function, which returns the wrapped string:
class ColorString(str): def __format__(self, format_spec): s = colorize(self, format_spec) return sThis implementation lets you change only the text color and not the background, but it makes the format very simple (you just supply the color name). Here is ColorString in action. First, the example prepares a list of ColorString words by splitting a simple sentence ("Yeah, it works!") and then prints each word in a different color, by specifying a format string with the colors red, green, and blue:
words = [ColorString(x) for x in 'Yeah, it works!'.split()] print('{0:red} {1:green} {2:blue}'.format(*words))Python 3.0 also has a global format() function used to format single objects. It simply calls the object's __format__() method. Here it is at work with ColorString:
>>> format(ColorString('Gigi'), 'red') '\x1b[31mGigi\x1b[0m'This subclassing scheme works fine, but it feels a bit cumbersome to create a special class with a __format__() method whenever you want some custom formatting. In addition, the subclassing scheme requires developers to construct special objects such as ColorString to take advantage of the formatting. Fortunately, you can go even further by implementing your own formatter classes and using them to format any type. For example, it would be convenient to just be able to print text in any color you want. The next example shows a class called ColorFormatter, which subclasses the string.Formatter class and overrides the format_field method. The override colorizes the field if it finds the format_spec in the colors list, or just applies the default formatting by calling Formatter.format_field():
from string import Formatter class ColorFormatter(Formatter): def format_field(self, value, format_spec): if format_spec in colors: return colorize(value, format_spec) else: return Formatter.format_field(self, value, format_spec)To use a custom formatter you need to instantiate it and then call its format() method to get the formatted string. To make it even more streamlined I assigned the bound format method to a variable named f, so it's easier to use:
formatter = ColorFormatter() f = formatter.format print(f('{0:cyan} works very {1:orange}.', 'ColorFormatter', 'well'))If you have a list of field values or dictionary with named fields you can use the vformat() method, which takes a list for positional arguments and a dictionary for keyword arguments:
formatter = ColorFormatter() f = formatter.vformat args = ['The', 'vformat()'] kwargs = dict(m='method', t='too') print(f('{0:red} {1:blue} {m:green} works {t:magenta}', args, kwargs)If you are a Windows developer looking for a little Python 3.0 homework a good exercise would be to implement the color ANSI escape codes for Windows by replacing the new print function. Your replacement print function should scan the text to print looking for ANSI escape sequences, parse them, and apply the proper color setting using the SetConsoleTextAttribute() API.
This article showed a wide range of examples detailing how the deep changes in Python 3.0 affect data types, math operations, and string formatting. Beyond these, Python 3.0 also made significant changes to the standard library, which you'll explore in the next article in this series.