Monday, March 6, 2023

Annotation Inheritance

Let's talk about annotations.

Type annotations in Python are mostly a static declaration to a type-checker like mypy or pyright about the expected types. However, they are also a dynamic data structure which a growing number of libraries such as the original attrs and dataclasses in the standard library, and even sqlalchemy use at runtime.
>>> from dataclasses import dataclass
>>>
>>> @dataclass
... class C:
...    a: int
...    b: str
...
>>> C(1, "a")
C(a=1, b='a')
These libraries inspect the annotations of a class to generate __init__ and __eq__, saving a lot of boilerplate code. You could call this type of API named tuple without the tuple. (To get meta, the typing module has added dataclass_transform which libraries can use to properly annotate new class decorators with this API.)

These libraries support inheritance of fields.
>>> @dataclass
... class D(C):
...    e: int
...
>>> D(1, "a", 2)
D(a=1, b='a', e=2)
Type checkers also consider class annotations to be inherited. For example, mypy considers this to be correct:
class A:
    a: int

class B(A): pass

B().a
That code fails at runtime, because nothing is actually setting a on the B instance. But, what if B was a dataclass?
>>> class A:
...    a: int
...
>>> @dataclass
... class B:
...    pass
...
>>> B(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() takes 1 positional argument but 2 were given
It doesn't work, because annotations are not inherited.
>>> A.__annotations__
{'a': <class 'int'>}
>>> B.__annotations__
{}
It's up to the library to look up it's inheritance tree and decide to include the annotations of parents or not when generating code. As it happens, dataclasses has made the design decision to only inherit annotations from other dataclasses.

As an aside, class variables which are used to represent default values are inherited.
>>> class A:
...    a = 1
...
>>> @dataclass
... class B(A):
...    a: int
...
>>> B()
B(a=1)
We can write another decorator which grabs annotations from parents and adds them in method resolution order, as if they were inherited.
def inherit_annotations(cls):
    annotations = {}
    for parent in cls.__mro__[::-1]:
    	# reverse order so children override parents
        annotations.update(getattr(parent, "__annotations__", {}))
        # use getattr(): not everything has __annotations__
    cls.__annotations__.update(annotations)
    return cls
Since all dataclasses sees is the __annotations__ dict at runtime, any modifications made before the class decorator runs will be reflected in the generated fields.
>>> @dataclass
... @inherit_annotations
... class B(A): pass
...
>>> B(1)
B(a=1)
Here's a robustified version of the function.

I know what you're thinking though: why not just use multiple class decorators? Sure, all but one of the generated __init__s will be overwritten, but that's fine because they all have the same behavior anyway.
import attr
from dataclasses import dataclass


@dataclass
@attr.define
class DualCitizen:
    a: int

@dataclass
class Dataclassified(DualCitizen):
    pass

@attr.define
class Attrodofined(DualCitizen):
    pass
Looks like perfectly normal class definitions.
>>> DualCitizen(1)
DualCitizen(a=1)
>>> Dataclassified(1)
Dataclassified(a=1)
>>> Attrodofined(1)
Attrodofined(1)
And it works.

So, type-checkers consider annotations to be inherited, but class decorators which use annotations at runtime only inherit annotations from ancestors with the same decorator. We can work around this either by multiply decorating the ancestors, or by pulling annotations from ancestors into __annotations__.

Wednesday, September 14, 2022

Mock Everything

 A mock object is meant to simulate any API for the purposes of testing.

The python standard library includes MagicMock.

>>> from unittest.mock import MagicMock
>>> mock = MagicMock()
>>> mock.a
<MagicMock name='mock.a' id='281473174436496'>
>>> mock[0]
<MagicMock name='mock.__getitem__()' id='281473165975360'>
>>> mock + 1
<MagicMock name='mock.__add__()' id='281473165479264'>

However, there is one place where MagicMock fails.

>>> a, b = mock
Traceback (most recent call last):
  File "", line 1, in 
ValueError: not enough values to unpack (expected 2, got 0)

The syntax which allows a comma separated series of names on the left to unpack the value on the right is known as sequence unpacking in python.

The reason MockObject is incompatible with sequence unpacking is due to a limitation of operator overloading in python when it comes to this piece of syntax.  Let's take a look at how the failing line compiles:

>>> import dis
>>> dis.dis(compile("a, b = fake", "string", "exec"))
  1           0 LOAD_NAME                0 (fake)
              2 UNPACK_SEQUENCE          2
              4 STORE_NAME               1 (a)
              6 STORE_NAME               2 (b)
              8 LOAD_CONST        

We pull fake onto the stack, run the opcode UNPACK_SEQUENCE with a parameter of 2, then store the results into a and b.  The issue is that MockObject.__iter__() has no way of knowing that UNPACK_SEQUENCE is expecting two values.

So, let's cheat and figure out how to do it anyway.

>>> import sys
>>> class MagicSequence:
...    def __iter__(self):
...       # get the python stack frame which is calling this one
...       frame = sys._getframe(1)
...       # which instruction index is that frame on
...       opcode_idx = frame.f_lasti
...       # what instruction does that index translate to
...       opcode = frame.f_code.co_code[opcode_idx]
...       # is it a sequence unpack instruction?
...       if opcode == 92:  # opcode.opmap['UNPACK_SEQUENCE']
...          # the next byte after the opcode is its parameter,
...          # which is the length that the sequence unpack expects
...          opcode_param = frame.f_code.co_code[opcode_idx + 1]
...          # return an iterator of the expected length
...          return iter(range(opcode_param))
...       return iter([])  # otherwise, return an empty iterator
...
>>> a, b = MagicSequence()
>>> a, b
(0, 1)
>>> a, b, c = MagicSequence()
>>> a, b, c
(0, 1, 2)

Sunday, January 16, 2022

Are they equal?

>>> a = []; a.append(a); a
[[...]]
>>> b = []; b.append(b); b
[[...]]
>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RecursionError: maximum recursion depth exceeded in comparison

Thursday, September 3, 2020

Not counting zeros

We all have our favorite way of intentionally raising an exception in Python. Some like referencing an undefined variable to get a simple NameError, others might import a module that doesn't exist for a bold ImportError.

But the tasteful exceptioneer knows to reach for that classic computer-confounding conundrum: 1/0 for a satisfyingly descriptive DivisionByZero.

So, when does dividing by 0 not raise DivisionByZero?

Why, when you divide 0 by a Decimal(0), of course!

>>> from decimal import Decimal
>>> Decimal(0) / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.DivisionUndefined'>]
>>> Decimal(1) / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.DivisionByZero: [<class 'decimal.DivisionByZero'>]

The numerator type doesn't seem to matter either:

>>> 0 / Decimal(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.DivisionUndefined'>]

"InvalidOperation" just doesn't quite have the same ring to it! Well, they can't all be heroes. :)

Thursday, September 12, 2019

Welcome to the float zone...

Consider a REPL with two tuples, a and b.

>>> type(a), type(b)
(<type 'tuple'>, <type 'tuple'>)
>>> a == b
True


So far, so good.  But let's dig deeper...

>>> a[0] == b[0]
False


The tuples are equal, but their contents is not.



>>> a is b
True





In fact, there was only ever one tuple.
What is this madness?

>>> a
(nan,)


Welcome to the float zone.

Many parts of python assume that a is b implies a == b, but floats break this assumption.  They also break the assumption that hash(a) == hash(b) implies a == b.

>>> hash(float('nan')) == hash(float('nan'))
True


Dicts handle this pretty elegantly:

>>> n = float('nan')
>>> {n: 1}[n]
1


>>> a = {float('nan'): 1, float('nan'): 2}
>>> a
{nan: 1, nan: 2}