Thursday, September 12, 2019

Welcome to the float zone...

Consider a REPL with two tuples, a and b.

>>> type(a), type(b)
(<type 'tuple'>, <type 'tuple'>)
>>> a == b
True


So far, so good.  But let's dig deeper...

>>> a[0] == b[0]
False


The tuples are equal, but their contents is not.



>>> a is b
True





In fact, there was only ever one tuple.
What is this madness?

>>> a
(nan,)


Welcome to the float zone.

Many parts of python assume that a is b implies a == b, but floats break this assumption.  They also break the assumption that hash(a) == hash(b) implies a == b.

>>> hash(float('nan')) == hash(float('nan'))
True


Dicts handle this pretty elegantly:

>>> n = float('nan')
>>> {n: 1}[n]
1


>>> a = {float('nan'): 1, float('nan'): 2}
>>> a
{nan: 1, nan: 2}

Monday, June 3, 2019

They say a python tuple can't contain itself...

... but here at PDW we abhor that kind of defeatism!

>>> import ctypes
>>> tup = (None,)
>>> ctypes.pythonapi.PyTuple_SetItem.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
>>> ctypes.pythonapi.PyTuple_SetItem(id(tup), 0, id(tup))
0


Showing the tuple itself is a little problematic
>>> tup
# ... hundreds of lines of parens ...
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((Segmentation fault

Wednesday, January 23, 2019

So a list and a tuple walk into a sum()

As a direct side effect of glom's 19.1.0 release, the authors here at PDW got to re-experience one of the more surprising behaviors of three of Python's most basic constructs:
Most experienced developers know the quickest way to combine a short list of short lists:
list_of_lists = [[1], [2], [3, 4]]
sum(list_of_lists, [])
# [1, 2, 3, 4]
Ah, nice and flat, much better.

But what happens when we throw a tuple into the mix:
list_of_seqs = [[1], [2], (3, 4)]
sum(list_of_seqs, [])
# TypeError: can only concatenate list (not "tuple") to list
This is kind of surprising! Especially when you consider this:
seq = [1, 2]
seq += (3, 4)
# [1, 2, 3, 4]
Why should sum() fail when addition succeeds?! We'll get to that.
new_list = [1, 2] + (3, 4)
# TypeError: can only concatenate list (not "tuple") to list
There's that error again!

The trick here is that Python has two addition operators. The simple "+" or "add" operator, used by sum(), and the more nuanced "+=" or "iadd" operator, add's inplace variant.

But why is ok for one addition to error and the other to succeed?

Symmetry. And maybe commutativity if you remember that math class.

"+" in Python is symmetric: A + B and B + A should always yield the same result. To do otherwise would be more surprising than any of the surprises above. list and tuple cannot be added with this operator because in a mixed-type situation, the return type would change based on ordering.

Meanwhile, "+=" is asymmetric. The left side of the statement determines the type of the return completely. A += B keeps A's type. A straightforward, Pythonic reason if there ever was one.

Going back to the start of our story, by building on operator.iadd, glom's new flatten() function avoids sum()'s error-raising behavior and works wonders on all manner of nesting iterable.