Thursday, March 17, 2016

d800 + dc00 = 10000

Unicode strings will not always have length equal to the number of characters inside them.  (This probably depends on the unicode library Python was compiled with.)

Two one character unicodes:
>>> u'\U00010000'
u'\U00010000'

>>> u'\U00008000'
u'\U00008000'


But they aren't exactly the same:
>>> len(u'\U00008000')
1
>>> len(u'\U00010000')
2


Can you guess what the two characters will be?
>>> u'\U00010000'[0]
u'\ud800'
>>> u'\U00010000'[1]
u'\udc00'

>>> u'\ud800' + u'\udc00'
u'\U00010000'


(Mahmoud):

The length of unicode characters is actually their length as represented in memory. The first character (耀 for the curious) is half the size of the second character (𐀀). They were arbitrarily chosen because one fits into two bytes in memory, and the other, spills over into three bytes.

You can check how your Python build stores these characters in memory by running

>>> import sys
>>> sys.maxunicode

If it's > 65536 then you've got UCS-4 (wide) in-memory representation and will get a len of 1 for the characters above. If it's <= 65536, then you've got UCS-2 (narrow), and you'll get the confusing and arguably wrong lengths.

These settings are configured when Python is built, and cannot be changed at runtime. Future versions of Python seek to eliminate this distinction altogether.

12 comments:

  1. Actually I read it yesterday I looked at most of your posts but I had some ideas about it . This article is probably where I got the most useful information for my research and today I wanted to read it again because it is so well written.
    Data Science Course in Bangalore

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Thank you for the interesting post. When it comes to online technologies, I recommend storing your Sociology Dissertation Writing ServiceEverything will become evident after reading about data room software.

    ReplyDelete
  4. I am not a programming student. But this code is helpful for computer science students. I am working as an academic writer and offer to buy dissertation online .

    ReplyDelete
  5. Great. Thanks for sharing this. I am a programming student, and this code is useful for me. Many students think that programming assignments are quite tough, and that is why some took help from professionals PhD dissertation writing services.

    ReplyDelete
  6. This code is helpful for programmers and computer science students. Many users can take benefit from these kinds of blogs and enhance their programming skills. I am working as an academic writer and provide dissertation proposal writing services.

    ReplyDelete
  7. Wow! Who knew python could do all of this! I am sorry if my reaction seems dull, actually I have been looking for How to Buy Master Dissertation related articles for a while now and there are just so many articles that it is hard to come across the best one. Anyways , thank you so much for letting us know so much about such stunning facts.

    ReplyDelete
  8. Thank you very much for writing such a fascinating essay. In some of your postings, you use real-life examples that are perfect for me as a reader, but I didn't always comprehend what you were saying. It appears to be a ruse or insignificant also can someone do my essay for me I found with your blog. To be sure, it's a fantastic one.

    ReplyDelete
  9. In the occasion that you're ready for 30 hours free childcare, sign in to get a online equine to oblige your childcare provider. In the occasion that you've picked.

    ReplyDelete
  10. Excellent service and products. slaters discount code I've used this company many times, no complaints. Always great product and service.

    ReplyDelete
  11. Excellent services and product, World of Wallpaper Discount Code The new of an any other desine wallpaper to always great product and service site.

    ReplyDelete