Python Does What?!?: d800 + dc00 = 10000

Thursday, March 17, 2016

d800 + dc00 = 10000

Unicode strings will not always have length equal to the number of characters inside them. (This probably depends on the unicode library Python was compiled with.)

Two one character unicodes:
>>> u'\U00010000'
u'\U00010000'
>>> u'\U00008000'
u'\U00008000'

But they aren't exactly the same:
>>> len(u'\U00008000')
1
>>> len(u'\U00010000')
2

Can you guess what the two characters will be?
>>> u'\U00010000'[0]
u'\ud800'
>>> u'\U00010000'[1]
u'\udc00'

>>> u'\ud800' + u'\udc00'
u'\U00010000'

(Mahmoud):

The length of unicode characters is actually their length as represented in memory. The first character (耀 for the curious) is half the size of the second character (𐀀). They were arbitrarily chosen because one fits into two bytes in memory, and the other, spills over into three bytes.

You can check how your Python build stores these characters in memory by running

>>> import sys
>>> sys.maxunicode

If it's > 65536 then you've got UCS-4 (wide) in-memory representation and will get a len of 1 for the characters above. If it's <= 65536, then you've got UCS-2 (narrow), and you'll get the confusing and arguably wrong lengths.

These settings are configured when Python is built, and cannot be changed at runtime. Future versions of Python seek to eliminate this distinction altogether.

12 comments:

Eva E. MurphyNovember 6, 2021 at 12:35 AM
This comment has been removed by the author.
ReplyDelete
Replies
Eva E. MurphyNovember 6, 2021 at 12:36 AM
Thank you for the interesting post. When it comes to online technologies, I recommend storing your Sociology Dissertation Writing ServiceEverything will become evident after reading about data room software.
ReplyDelete
Replies
Nick HunterNovember 8, 2021 at 10:41 PM
I am not a programming student. But this code is helpful for computer science students. I am working as an academic writer and offer to buy dissertation online .
ReplyDelete
Replies
Sarah MarkNovember 9, 2021 at 12:52 AM
Great. Thanks for sharing this. I am a programming student, and this code is useful for me. Many students think that programming assignments are quite tough, and that is why some took help from professionals PhD dissertation writing services.
ReplyDelete
Replies
Eliza BethNovember 9, 2021 at 2:16 AM
This code is helpful for programmers and computer science students. Many users can take benefit from these kinds of blogs and enhance their programming skills. I am working as an academic writer and provide dissertation proposal writing services.
ReplyDelete
Replies
jamaima cyrusNovember 10, 2021 at 2:17 AM
Wow! Who knew python could do all of this! I am sorry if my reaction seems dull, actually I have been looking for How to Buy Master Dissertation related articles for a while now and there are just so many articles that it is hard to come across the best one. Anyways , thank you so much for letting us know so much about such stunning facts.
ReplyDelete
Replies
OliyanaBethNovember 13, 2021 at 3:48 AM
Thank you very much for writing such a fascinating essay. In some of your postings, you use real-life examples that are perfect for me as a reader, but I didn't always comprehend what you were saying. It appears to be a ruse or insignificant also can someone do my essay for me I found with your blog. To be sure, it's a fantastic one.
ReplyDelete
Replies
Jennifer AnistonMarch 15, 2022 at 5:48 AM
In the occasion that you're ready for 30 hours free childcare, sign in to get a online equine to oblige your childcare provider. In the occasion that you've picked.
ReplyDelete
Replies
thisisemliybluntApril 11, 2022 at 7:36 PM
Excellent service and products. slaters discount code I've used this company many times, no complaints. Always great product and service.
ReplyDelete
Replies
thisisemliybluntApril 21, 2022 at 8:14 PM
Excellent services and product, World of Wallpaper Discount Code The new of an any other desine wallpaper to always great product and service site.
ReplyDelete
Replies
UnknownAugust 13, 2022 at 4:35 AM
The future of marketing is digital marketing. Trust me, the internet is the future of everything. I have been looking for Best Assignment Writers UK in UK and I reached out to the best site by scrolling through Facebook. The business that wants to run successfully should switch to digital marketing.
ReplyDelete
Replies
Kristina D. MadisonAugust 15, 2022 at 6:03 AM
Yes, Python is a very difficult programming language to face a lot of difficulties. that's why our eBook writing company publish a new python book version Which will be user-friendly for the student and also will take enough interest in reading

ReplyDelete
Replies

Add comment