Wednesday, January 23, 2019

So a list and a tuple walk into a sum()

As a direct side effect of glom's 19.1.0 release, the authors here at PDW got to re-experience one of the more surprising behaviors of three of Python's most basic constructs:
Most experienced developers know the quickest way to combine a short list of short lists:
list_of_lists = [[1], [2], [3, 4]]
sum(list_of_lists, [])
# [1, 2, 3, 4]
Ah, nice and flat, much better.

But what happens when we throw a tuple into the mix:
list_of_seqs = [[1], [2], (3, 4)]
sum(list_of_seqs, [])
# TypeError: can only concatenate list (not "tuple") to list
This is kind of surprising! Especially when you consider this:
seq = [1, 2]
seq += (3, 4)
# [1, 2, 3, 4]
Why should sum() fail when addition succeeds?! We'll get to that.
new_list = [1, 2] + (3, 4)
# TypeError: can only concatenate list (not "tuple") to list
There's that error again!

The trick here is that Python has two addition operators. The simple "+" or "add" operator, used by sum(), and the more nuanced "+=" or "iadd" operator, add's inplace variant.

But why is ok for one addition to error and the other to succeed?

Symmetry. And maybe commutativity if you remember that math class.

"+" in Python is symmetric: A + B and B + A should always yield the same result. To do otherwise would be more surprising than any of the surprises above. list and tuple cannot be added with this operator because in a mixed-type situation, the return type would change based on ordering.

Meanwhile, "+=" is asymmetric. The left side of the statement determines the type of the return completely. A += B keeps A's type. A straightforward, Pythonic reason if there ever was one.

Going back to the start of our story, by building on operator.iadd, glom's new flatten() function avoids sum()'s error-raising behavior and works wonders on all manner of nesting iterable.

3 comments:

  1. Wow, this is new to me. Thanks for sharing it.

    ReplyDelete
  2. "+" is not very symmetric when you're adding strings... (or lists, for that matter, [1] + [2] != [2] + [1]).

    ReplyDelete
  3. >>> a=[4,6]
    >>> a += (i**2 for i in range(4))
    >>> a
    [4, 6, 0, 1, 4, 9]

    ReplyDelete