Wednesday, September 14, 2022

Mock Everything

 A mock object is meant to simulate any API for the purposes of testing.

The python standard library includes MagicMock.

>>> from unittest.mock import MagicMock
>>> mock = MagicMock()
>>> mock.a
<MagicMock name='mock.a' id='281473174436496'>
>>> mock[0]
<MagicMock name='mock.__getitem__()' id='281473165975360'>
>>> mock + 1
<MagicMock name='mock.__add__()' id='281473165479264'>

However, there is one place where MagicMock fails.

>>> a, b = mock
Traceback (most recent call last):
  File "", line 1, in 
ValueError: not enough values to unpack (expected 2, got 0)

The syntax which allows a comma separated series of names on the left to unpack the value on the right is known as sequence unpacking in python.

The reason MockObject is incompatible with sequence unpacking is due to a limitation of operator overloading in python when it comes to this piece of syntax.  Let's take a look at how the failing line compiles:

>>> import dis
>>> dis.dis(compile("a, b = fake", "string", "exec"))
  1           0 LOAD_NAME                0 (fake)
              2 UNPACK_SEQUENCE          2
              4 STORE_NAME               1 (a)
              6 STORE_NAME               2 (b)
              8 LOAD_CONST        

We pull fake onto the stack, run the opcode UNPACK_SEQUENCE with a parameter of 2, then store the results into a and b.  The issue is that MockObject.__iter__() has no way of knowing that UNPACK_SEQUENCE is expecting two values.

So, let's cheat and figure out how to do it anyway.

>>> import sys
>>> class MagicSequence:
...    def __iter__(self):
...       # get the python stack frame which is calling this one
...       frame = sys._getframe(1)
...       # which instruction index is that frame on
...       opcode_idx = frame.f_lasti
...       # what instruction does that index translate to
...       opcode = frame.f_code.co_code[opcode_idx]
...       # is it a sequence unpack instruction?
...       if opcode == 92:  # opcode.opmap['UNPACK_SEQUENCE']
...          # the next byte after the opcode is its parameter,
...          # which is the length that the sequence unpack expects
...          opcode_param = frame.f_code.co_code[opcode_idx + 1]
...          # return an iterator of the expected length
...          return iter(range(opcode_param))
...       return iter([])  # otherwise, return an empty iterator
>>> a, b = MagicSequence()
>>> a, b
(0, 1)
>>> a, b, c = MagicSequence()
>>> a, b, c
(0, 1, 2)

Sunday, January 16, 2022

Are they equal?

>>> a = []; a.append(a); a
>>> b = []; b.append(b); b
>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded in comparison

Thursday, September 3, 2020

Not counting zeros

We all have our favorite way of intentionally raising an exception in Python. Some like referencing an undefined variable to get a simple NameError, others might import a module that doesn't exist for a bold ImportError.

But the tasteful exceptioneer knows to reach for that classic computer-confounding conundrum: 1/0 for a satisfyingly descriptive DivisionByZero.

So, when does dividing by 0 not raise DivisionByZero?

Why, when you divide 0 by a Decimal(0), of course!

>>> from decimal import Decimal
>>> Decimal(0) / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.DivisionUndefined'>]
>>> Decimal(1) / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.DivisionByZero: [<class 'decimal.DivisionByZero'>]

The numerator type doesn't seem to matter either:

>>> 0 / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.DivisionUndefined'>]

"InvalidOperation" just doesn't quite have the same ring to it! Well, they can't all be heroes. :)

Thursday, September 12, 2019

Welcome to the float zone...

Consider a REPL with two tuples, a and b.

>>> type(a), type(b)
(<type 'tuple'>, <type 'tuple'>)
>>> a == b

So far, so good.  But let's dig deeper...

>>> a[0] == b[0]

The tuples are equal, but their contents is not.

>>> a is b

In fact, there was only ever one tuple.
What is this madness?

>>> a

Welcome to the float zone.

Many parts of python assume that a is b implies a == b, but floats break this assumption.  They also break the assumption that hash(a) == hash(b) implies a == b.

>>> hash(float('nan')) == hash(float('nan'))

Dicts handle this pretty elegantly:

>>> n = float('nan')
>>> {n: 1}[n]

>>> a = {float('nan'): 1, float('nan'): 2}
>>> a
{nan: 1, nan: 2}

Monday, June 3, 2019

They say a python tuple can't contain itself...

... but here at PDW we abhor that kind of defeatism!

>>> import ctypes
>>> tup = (None,)
>>> ctypes.pythonapi.PyTuple_SetItem.argtypes = ctypes.c_void_p, ctypes.c_int, ctypes.c_void_p
>>> ctypes.pythonapi.PyTuple_SetItem(id(tup), 0, id(tup))

Showing the tuple itself is a little problematic
>>> tup
# ... hundreds of lines of parens ...
((Segmentation fault

Wednesday, January 23, 2019

So a list and a tuple walk into a sum()

As a direct side effect of glom's 19.1.0 release, the authors here at PDW got to re-experience one of the more surprising behaviors of three of Python's most basic constructs:
Most experienced developers know the quickest way to combine a short list of short lists:
list_of_lists = [[1], [2], [3, 4]]
sum(list_of_lists, [])
# [1, 2, 3, 4]
Ah, nice and flat, much better.

But what happens when we throw a tuple into the mix:
list_of_seqs = [[1], [2], (3, 4)]
sum(list_of_seqs, [])
# TypeError: can only concatenate list (not "tuple") to list
This is kind of surprising! Especially when you consider this:
seq = [1, 2]
seq += (3, 4)
# [1, 2, 3, 4]
Why should sum() fail when addition succeeds?! We'll get to that.
new_list = [1, 2] + (3, 4)
# TypeError: can only concatenate list (not "tuple") to list
There's that error again!

The trick here is that Python has two addition operators. The simple "+" or "add" operator, used by sum(), and the more nuanced "+=" or "iadd" operator, add's inplace variant.

But why is ok for one addition to error and the other to succeed?

Symmetry. And maybe commutativity if you remember that math class.

"+" in Python is symmetric: A + B and B + A should always yield the same result. To do otherwise would be more surprising than any of the surprises above. list and tuple cannot be added with this operator because in a mixed-type situation, the return type would change based on ordering.

Meanwhile, "+=" is asymmetric. The left side of the statement determines the type of the return completely. A += B keeps A's type. A straightforward, Pythonic reason if there ever was one.

Going back to the start of our story, by building on operator.iadd, glom's new flatten() function avoids sum()'s error-raising behavior and works wonders on all manner of nesting iterable.

Friday, September 14, 2018

kids these days think data structures grow on trees

Args and kwargs are great features of Python.  There is a measurable (though highly variable) cost of them however:

>>> timeit.timeit(lambda: (lambda a, b: None)(1, b=2))

>>> timeit.timeit(lambda: (lambda *a, **kw: None)(1, b=2))

>>> timeit.timeit(lambda: (lambda *a, **kw: None)(1, b=2)) - timeit.timeit(lambda: (lambda a, b: None)(1, b=2))

Constructing that dict and tuple doesn't happen for free:

>>> timeit.timeit(lambda: ((1,), {'b': 2})) - timeit.timeit(lambda: None)

Specifically, it takes about 1/5,000,000th of a second.