Saturday, April 29, 2017

a return to yield

I remember when, almost a decade ago, I was first discovering generators. It was a heady time, and I saw applications everywhere.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x

I also remember the first time I tried to mix a return value into my generator.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x
    return True

Imagine my surprise, as I'm sure countless others experienced as well:

SyntaxError: 'return' with argument inside generator
A rare compile-time error! Only the decorative, bare return is allowed in generators, where they serve to raise StopIteration.

Now, imagine my surprise, so many years later when I import that same code in Python 3.


Nothing! No error. So what happened?

Turns out that the coroutine and asyncio machinery of Python 3 has repurposed this old impossibility.

If we manually iterate to skip over our yield:

fib_iter = fib_gen()                                                                                                                                                                                                    
for i in range(11):                                                                                                                 

We see what's really happening with our return:

Traceback (most recent call last):
  File "", line 13, in <module>
StopIteration: True

That's right, returns in generators now raise StopIteration with a single argument of the return value.

Most of the time you won't see this. StopIterations are automatically consumed and handled correctly by for loops, list comprehensions, and sequence constructors (like list). But it's yet another reason to be extra careful when writing your own generators, specific to Python 3.

Wednesday, April 12, 2017

Bit by bit: CPU architecture

There are a variety of reasons you might want to know how many bits the architecture of the CPU running your Python program has. Maybe you're about to use some statically-compiled C, or maybe you're just taking a survey.

Either way, you've got to know. One historical way way is:

import sys
IS_64BIT = sys.maxint > 2 ** 32

Except that sys.maxint is specific to Python 2. Being the crossover point where ints transparently become longs, sys.maxint doesn't apply in Python 3, where ints and longs have been merged into just one type: int (even though the C calls it PyLongObject). And Python 3's introduction of sys.maxsize doesn't help much if you're trying to support Python <2.7, where it doesn't exist.

So instead we can use the struct module:

import struct
IS_64BIT = struct.calcsize("P") > 4

This is a little less clear, but being backwards and forwards compatible, and given struct is still part of the standard library, it's a pretty good approach, and is the one taken in boltons.ecoutils.

But let's say you really wanted to get it down to a single line, and even standard library imports were out of the question, for some reason. You could do something like this:

IS_64BIT = tuple.__itemsize__ > 4

While not extensively documented, a lot of built-in types have __itemsize__ and __basicsize__ attributes, which describes the memory requirement of the underlying structure. For tuples, each item requires a pointer. Pointer size * 8 = bits in the architecture. 4 * 8 = 32-bit architecture, and 8 * 8 = 64-bit architecture.

Even though documentation isn't great, the __itemsize__ approach works back to at least Python 2.6 and forward to Python 3.7. Memory profilers like pympler use __itemsize__ and it might work for you, too!

Tuesday, March 21, 2017

When you can update locals()

There are two built-in functions, globals and locals.  These return dicts of the contents of the global and local scope.

Locals usually refers to the contents of a function, in which case it is a one-time copy.  Updates to the dict do not change the local scope:

>>> def local_fail():
...    a = 1
...    locals()['a'] = 2
...    print 'a is', a
>>> local_fail()
a is 1

However, in the body of a class definition, locals points to the __dict__ of the class, which is mutable.

>>> class Success(object):
...    locals().update({'a': 1})
>>> Success.a

Monday, March 13, 2017

identity theft

>>> class JSON(int):
...     from json import *
>>> json = JSON()
>>> json.dumps()

Monday, November 21, 2016

first hash past the post

Numeric types in Python have the interesting property that their hash() is often their value:

>>> hash(1)
>>> hash(1.0)

Python also considers floating point and integers of the same value to be equal:

>>> 1 == 1.0

Two things with the same hash that are equal count as the same key in a dict:

>>> 1 in {1.0: 'cat'}

However, it is possible for the key to be either an int or a float:

>>> {1: 'first'}
{1: 'first'}
>>> {1.0: 'second'}
{1.0: 'second'}

Whichever key is used first sticks.  Later writes to the dict can change the value, but the int or float key remains:

>>> {1: 'first', 1.0: 'second'}
{1: 'second'}
>>> {1.0: 'first', 1: 'second'}
{1.0: 'second'}

Wednesday, June 8, 2016

imports of no import

The Python standard library has hundreds of built-in modules (mine has 688 by one count). Some are more useful than others.

The most famous is "import this", which prints out the Zen of Python. The next most popular fun module could well be antigravity. Try importing it on a browser-capable machine, and (spoiler), you'll be taken here.

And Randall is right. As most people know, Python's "Hello world" is just one line: "print 'Hello world'"

But what if there were another, more confusing way to do it? Taking a page out of ow about:

>>> import __hello__
"Hello world..."

And, because it's a one-time module import, this super-useful module only works the one time:

>>> import __hello__

It even breaks that behavior on Python 3:

>>> import __hello__
"Hello world!"

And if that wasn't esoteric enough of an import, how about even more new syntax in Python 3:

>>> from __future__ import barry_as_FLUFL
>>> 'a' <> 'b'
>>> 'a' <> 'a'
This unfortunate syntax is the result of an April Fools PEP from 2009. Before this, previous attempts at introducing new syntax were met with a stiffer upper lip:

>>> from __future__ import braces
  File "<stdin>", line 1
SyntaxError: not a chance

All of which raises the question: how many undocumented jokes have made their way into Python?


Credit to Python core dev Raymond Hettinger and other Twitter friends for details and inspiration.

Thursday, May 5, 2016

String optimization in Python

Strings are terribly important in programming. A program without some form of string input, manipulation, and output is a rarity.

Of course this means that speed and sanity surrounding string features is important. One important feature of Python is string immutability. This opens up dozens of features, such as using strings as dictionary keys, but there are some downsides.

Immutable strings means that any string manipulation, such as splitting or appending, is making a copy of that string. This can become a performance problem, especially in a world where zero-copy is one of the favorite general optimization techniques. If you've done enough string mutation, you're probably aware of the following techniques:
But in some cases Python uses the immutability to avoid making copies:
>>> a = 'a' * 1024 * 1024  # a 1 megabyte string
>>> z = '' + a
>>> z is a
Here, because adding an empty string does not change the value, z is the same exact string object as a. And it doesn't matter how many times you append an empty string:
>>> z = '' + '' + '' + a
>>> z is a
It even works when a is the only item in a list:
>>> z = ''.join([a])
>>> z is a
But it falls apart when you put an empty string in the list with a:
>>> z = ''.join(['', a])
>>> z is a
And unfortunately even the first example seems to make a copy on PyPy:
>>>> a = 'a' * 1024 * 1024  # a 1 megabyte string again
>>>> z = '' + a
>>>> z is a 
Although something more advanced may be going on under the covers, as is often the case with PyPy.

I'm almost done stringing you along, but as a corollary reminder:
Never rely on "is" checks with ints, floats, and strings. "==" and other value checks are what you need. As a general rule, "is" is for objects, None, and sometimes True/False.

Keep on stringifying!