Wednesday, January 23, 2019

So a list and a tuple walk into a sum()

As a direct side effect of glom's 19.1.0 release, the authors here at PDW got to re-experience one of the more surprising behaviors of three of Python's most basic constructs:
Most experienced developers know the quickest way to combine a short list of short lists:
list_of_lists = [[1], [2], [3, 4]]
sum(list_of_lists, [])
# [1, 2, 3, 4]
Ah, nice and flat, much better.

But what happens when we throw a tuple into the mix:
list_of_seqs = [[1], [2], (3, 4)]
sum(list_of_seqs, [])
# TypeError: can only concatenate list (not "tuple") to list
This is kind of surprising! Especially when you consider this:
seq = [1, 2]
seq += (3, 4)
# [1, 2, 3, 4]
Why should sum() fail when addition succeeds?! We'll get to that.
new_list = [1, 2] + (3, 4)
# TypeError: can only concatenate list (not "tuple") to list
There's that error again!

The trick here is that Python has two addition operators. The simple "+" or "add" operator, used by sum(), and the more nuanced "+=" or "iadd" operator, add's inplace variant.

But why is ok for one addition to error and the other to succeed?

Symmetry. And maybe commutativity if you remember that math class.

"+" in Python is symmetric: A + B and B + A should always yield the same result. To do otherwise would be more surprising than any of the surprises above. list and tuple cannot be added with this operator because in a mixed-type situation, the return type would change based on ordering.

Meanwhile, "+=" is asymmetric. The left side of the statement determines the type of the return completely. A += B keeps A's type. A straightforward, Pythonic reason if there ever was one.

Going back to the start of our story, by building on operator.iadd, glom's new flatten() function avoids sum()'s error-raising behavior and works wonders on all manner of nesting iterable.

Friday, September 14, 2018

kids these days think data structures grow on trees

Args and kwargs are great features of Python.  There is a measurable (though highly variable) cost of them however:

>>> timeit.timeit(lambda: (lambda a, b: None)(1, b=2))

>>> timeit.timeit(lambda: (lambda *a, **kw: None)(1, b=2))

>>> timeit.timeit(lambda: (lambda *a, **kw: None)(1, b=2)) - timeit.timeit(lambda: (lambda a, b: None)(1, b=2))

Constructing that dict and tuple doesn't happen for free:

>>> timeit.timeit(lambda: ((1,), {'b': 2})) - timeit.timeit(lambda: None)

Specifically, it takes about 1/5,000,000th of a second.

Tuesday, June 5, 2018

when no-ops attack VII: assignment's revenge

Let's define a very simple class:

>>> class F(object):
...    @staticmethod
...    def f(): return "I'm such a simple function, nothing could go wrong"

>>> F.f()
"I'm such a simple function, nothing could go wrong"

 Now, let's do a trivial no-op to this class:

>>> F.f = F.f

Surely nothing changed, right?

>>> F.f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unbound method f() must be called with F instance as first argument (got nothing instead)

What happened?  staticmethod uses the descriptor protocol in order to return something other than itself when accessed as an attribute.  The assignment above is not a no-op, because it is not setting the value back to what it already was, but to what was returned by __get__ of the staticmethod object.

>>> class F(object):
...    @staticmethod
...    def f(): return "I'm not what I seem"
>>> F.f

<function f at 0x7f05eda596e0>
>>> F.__dict__['f']
<staticmethod object at 0x7f05eda5ce50>

Version note -- Python3 doesn't raise an exception, although the type still changes from staticmethod to function.

>>> class F:
...    @staticmethod
...    def f(): return "I'm protected by python3 wizardry"
>>> F.f()
"I'm protected by python3 wizardry"
>>> F.__dict__['f']
<staticmethod object at 0x7fd087b739b0>
>>> F.f = F.f
>>> F.__dict__['f']
<function F.f at 0x7fd087b5cae8>
>>> F.f()
"I'm protected by python3 wizardry"

Thursday, May 31, 2018

(i)t(er)able for one

When you expect that a sequence will only have one item, and are only interested in the first it is common to grab the zeroth element.  This will fail if the sequence is unexpectedly empty, but you might unintentionally silently throw away extra elements:

>>> a = 'a'[0]
>>> a = ''[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> a = 'ab'[0]  # oops, silently dropped b

An alternative idiom is to use sequence unpacking with a single item.  This way neither unexpected condition will silently pass.
>>> a, = 'a'
>>> a, = ''
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack
>>> a, = 'ab'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

Saturday, May 12, 2018

Captain, the python grammar can't take anymore!

The expressions are going to tear themselves to pieces!
>>> 'a'       .strip    (    ) [    0 ]

Friday, April 20, 2018

DISappearing and

Python has a very rich set of operators that can be overloaded.  From __get__ to __getattr__, __repr__ to __format__, and __complex__ to __iadd__ you can modify almost every behavior of your type.  Conspicuously absent however, are the boolean operators.

This is why Django ORM and SQLAlchemy use bitwise & and | to represent SQL and / or.

Let's take a closer look at how the Python compiler treats these operators:

>>> import dis
>>> dis.dis(lambda: a & b)
  1           0 LOAD_GLOBAL              0 (a)
              3 LOAD_GLOBAL              1 (b)
              6 BINARY_AND
              7 RETURN_VALUE
>>> dis.dis(lambda: a and b)
  1           0 LOAD_GLOBAL              0 (a)
              3 JUMP_IF_FALSE_OR_POP     9
              6 LOAD_GLOBAL              1 (b)
        >>    9 RETURN_VALUE

Not only can you not override the and operator, the Python VM doesn't even have an opcode for it.

In return, Python gives you the semantics that a or b returns not True or False, but either or b (or False if neither is truthy).