Thursday, March 17, 2016

d800 + dc00 = 10000

Unicode strings will not always have length equal to the number of characters inside them.  (This probably depends on the unicode library Python was compiled with.)

Two one character unicodes:
>>> u'\U00010000'
u'\U00010000'

>>> u'\U00008000'
u'\U00008000'


But they aren't exactly the same:
>>> len(u'\U00008000')
1
>>> len(u'\U00010000')
2


Can you guess what the two characters will be?
>>> u'\U00010000'[0]
u'\ud800'
>>> u'\U00010000'[1]
u'\udc00'

>>> u'\ud800' + u'\udc00'
u'\U00010000'


(Mahmoud):

The length of unicode characters is actually their length as represented in memory. The first character (耀 for the curious) is half the size of the second character (𐀀). They were arbitrarily chosen because one fits into two bytes in memory, and the other, spills over into three bytes.

You can check how your Python build stores these characters in memory by running

>>> import sys
>>> sys.maxunicode

If it's > 65536 then you've got UCS-4 (wide) in-memory representation and will get a len of 1 for the characters above. If it's <= 65536, then you've got UCS-2 (narrow), and you'll get the confusing and arguably wrong lengths.

These settings are configured when Python is built, and cannot be changed at runtime. Future versions of Python seek to eliminate this distinction altogether.

unicode + ord

The ord() built-in may return very large values when handed a 1-character unicode string:

>>> ord(u'\U00008000')
32768


This means that chr(ord(s)) will not always work.

>>> chr(ord(u'\U00008000'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: chr() arg not in range(256)

Tuesday, February 9, 2016

Undecoratable

Decorators are one of Python's bigger success stories, and many programmers' first experience with higher-order programming. Most practiced and prolific Python programmers will find themselves making good use of them regularly.

But every feature has its limits, and here's a new one to try on for size:

>>> @x().y()
  File "<stdin>", line 1
    @x().y()
        ^
SyntaxError: invalid syntax

That's right, decoration is not an arbitrary Python expression. It doesn't matter what x and y were, or even if they were defined. You can't follow a function call with a dot. @x() works fine, @x.y() would work fine, too. But @x().y(), that's only for mad Pythonists who would take things TOO FAR.

Decorator invocations, defined at the top of the Python grammar, can only be followed by class definitions and function definitions.

Well, now we know, and now we can all say we've been there

-- Mahmoud
http://sedimental.org/
https://github.com/mahmoud
https://twitter.com/mhashemi

Wednesday, February 3, 2016

List Comprehension Code Golf

Ah code golf, pastime of our navelgazing alter egos. Being designed for readability and maintainability, Python doesn't always show well in this sort of sport, but occasionally we get thrown a bone. For instance, for nonzero even numbers less than 10:

>>> [x for x in range(10) if x and not x % 2]
[2, 4, 6, 8]

is equivalent to

>>> [x for x in range(10) if x if not x % 2]
[2, 4, 6, 8]

A whole character saved! Yes, a close reading of PEP 202 will show that one of the canonical examples of list comprehensions uses this pattern for... some reason.

Either way, now you know. Sally forth and do what must be done with all code golf tricks: Never, Ever Use Them For Production Code.

-- Mahmoud
http://sedimental.org/
https://github.com/mahmoud
https://twitter.com/mhashemi

Wednesday, October 14, 2015

Look before you leap

There are countless reasons to do unsafe memory access.  Here's some help:

import ctypes
import os

if hasattr(ctypes.cdll, "msvcrt"):  #windows branch
    _BUF = ctypes.create_string_buffer('a')


    def _check(address):
        try:
            ctypes.cdll.msvcrt.memcpy(_BUF, address, 1)
            return True
        except WindowsError:
            return False


else:
    libc = ctypes.CDLL('libc.so.6')
    libc.write.argtypes = ctypes.c_int, ctypes.c_void_p, ctypes.c_size_t

    _BUF = open('/tmp/_buf', 'w')
    _FD = _BUF.fileno()
    os.unlink(_BUF)

    def _check(address):
        rc = libc.write(_FD, address, 1)
        _BUF.seek(0)
        return rc == 1


def is_valid_ptr(address):
    'determines if the passed address (integer) is a valid pointer'
    return _check(address)

Wednesday, September 23, 2015

Loopy references

This may not come as a surprise to experienced Python programmers, but objects in Python can reference or "contain" themselves and/or their parent objects.

A trivial example, the self-containing list:

>>> a = []
>>> a.append(a)
>>> a
[[...]]

Kind of a cool repr, at least. How else could you flatly represent this infinitely nested object? What happens when you take two of these and smash them together with an equals comparator?

>>> b = []
>>> b.append(b)
>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp


Like two mirrors pointed at one another, they compare back and forth forever, or at least for...

>>> sys.getrecursionlimit()
1000 

... loops.

Tuesday, July 7, 2015

collections.deque random access is O(n)

>>> r = range(int(1e6))
>>> li = list(r)
>>> de = collections.deque(r)
>>> pos = len(de) / 2
>>> timeit.timeit(lambda: de[100])
0.1355267132935296
>>> timeit.timeit(lambda: li[100])
0.14215966414053582

Great, list and deque performance are identical.

>>> timeit.timeit(lambda: li[pos])
0.1369839611421071
>>> timeit.timeit(lambda: de[pos])
56.227819584345696

Oops.  O(1) vs O(n).