Saturday, April 29, 2017

a return to yield

I remember when, almost a decade ago, I was first discovering generators. It was a heady time, and I saw applications everywhere.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x
    return

I also remember the first time I tried to mix a return value into my generator.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x
    return True

Imagine my surprise, as I'm sure countless others experienced as well:

SyntaxError: 'return' with argument inside generator
A rare compile-time error! Only the decorative, bare return is allowed in generators, where they serve to raise StopIteration.

Now, imagine my surprise, so many years later when I import that same code in Python 3.


...

Nothing! No error. So what happened?

Turns out that the coroutine and asyncio machinery of Python 3 has repurposed this old impossibility.

If we manually iterate to skip over our yield:

fib_iter = fib_gen()                                                                                                                                                                                                    
for i in range(11):                                                                                                                 
    next(fib_iter)                                                                                                                  
next(fib_iter)

We see what's really happening with our return:

Traceback (most recent call last):
  File "fib_gen.py", line 13, in <module>
    next(fib_iter)
StopIteration: True

That's right, returns in generators now raise StopIteration with a single argument of the return value.

Most of the time you won't see this. StopIterations are automatically consumed and handled correctly by for loops, list comprehensions, and sequence constructors (like list). But it's yet another reason to be extra careful when writing your own generators, specific to Python 3.

Wednesday, April 12, 2017

Bit by bit: CPU architecture

There are a variety of reasons you might want to know how many bits the architecture of the CPU running your Python program has. Maybe you're about to use some statically-compiled C, or maybe you're just taking a survey.

Either way, you've got to know. One historical way way is:

import sys
IS_64BIT = sys.maxint > 2 ** 32

Except that sys.maxint is specific to Python 2. Being the crossover point where ints transparently become longs, sys.maxint doesn't apply in Python 3, where ints and longs have been merged into just one type: int (even though the C calls it PyLongObject). And Python 3's introduction of sys.maxsize doesn't help much if you're trying to support Python <2.7, where it doesn't exist.

So instead we can use the struct module:

import struct
IS_64BIT = struct.calcsize("P") > 4

This is a little less clear, but being backwards and forwards compatible, and given struct is still part of the standard library, it's a pretty good approach, and is the one taken in boltons.ecoutils.

But let's say you really wanted to get it down to a single line, and even standard library imports were out of the question, for some reason. You could do something like this:

IS_64BIT = tuple.__itemsize__ > 4

While not extensively documented, a lot of built-in types have __itemsize__ and __basicsize__ attributes, which describes the memory requirement of the underlying structure. For tuples, each item requires a pointer. Pointer size * 8 = bits in the architecture. 4 * 8 = 32-bit architecture, and 8 * 8 = 64-bit architecture.

Even though documentation isn't great, the __itemsize__ approach works back to at least Python 2.6 and forward to Python 3.7. Memory profilers like pympler use __itemsize__ and it might work for you, too!