Thursday, May 5, 2016

String optimization in Python

Strings are terribly important in programming. A program without some form of string input, manipulation, and output is a rarity.

Of course this means that speed and sanity surrounding string features is important. One important feature of Python is string immutability. This opens up dozens of features, such as using strings as dictionary keys, but there are some downsides.

Immutable strings means that any string manipulation, such as splitting or appending, is making a copy of that string. This can become a performance problem, especially in a world where zero-copy is one of the favorite general optimization techniques. If you've done enough string mutation, you're probably aware of the following techniques:
But in some cases Python uses the immutability to avoid making copies:
>>> a = 'a' * 1024 * 1024  # a 1 megabyte string
>>> z = '' + a
>>> z is a
True
Here, because adding an empty string does not change the value, z is the same exact string object as a. And it doesn't matter how many times you append an empty string:
>>> z = '' + '' + '' + a
>>> z is a
True
It even works when a is the only item in a list:
>>> z = ''.join([a])
>>> z is a
True
But it falls apart when you put an empty string in the list with a:
>>> z = ''.join(['', a])
>>> z is a
False
And unfortunately even the first example seems to make a copy on PyPy:
>>>> a = 'a' * 1024 * 1024  # a 1 megabyte string again
>>>> z = '' + a
>>>> z is a 
False 
Although something more advanced may be going on under the covers, as is often the case with PyPy.

I'm almost done stringing you along, but as a corollary reminder:
Never rely on "is" checks with ints, floats, and strings. "==" and other value checks are what you need. As a general rule, "is" is for objects, None, and sometimes True/False.

Keep on stringifying!

Mahmoud
http://sedimental.org/
https://github.com/mahmoud
https://twitter.com/mhashemi

90 comments:

  1. I think the cpython behaviour is an optimisation possible because of reference counting. Cpython can tell when adding strings if it's the only reference, and can reuse the memory rather than copying in cases like above. Pypy doesn't use reference counting, so can't do the same trick, aiui

    ReplyDelete
    Replies
    1. While it's true that PyPy does not use CPython-style reference counting, I don't think that implies that PyPy can't have this optimization, per se. PyPy still has immutable strings, as that's a property of the Python language, not the runtime.

      Delete
  2. This could be related to interned strings: you can force any string to be unique and in a global table using the intern() function.

    ReplyDelete
    Replies
    1. You are correct about intern() and that may be the culprit with shorter strings, but you'll notice I created a 1MB string to start with. Very much hope our relatively lightweight CPython doesn't magically intern that :)

      Delete
  3. I should thank you for the undertakings you have made in making this article. I am confiding in a similar best work from you later on as well.. Enterprise SEO Services

    ReplyDelete
  4. currently trending technologies are phyton , azure . learn azure through azure training

    ReplyDelete
  5. Just like the annual SEO returns, SEO income per customer also varies. off page seo

    ReplyDelete
  6. I learned World's Trending Technology from certified experts for free of cost. I got a job in decent Top MNC Company with handsome 14 LPA salary, I have learned the World's Trending Technology from python training in btm layout experts who know advanced concepts which can help to solve any type of Real-time issues in the field of Python. Really worth trying Freelance SEO expert in Bangalore

    ReplyDelete
  7. Thank you so much for this useful article. Visit OGEN Infosystem for Web Designing and SEO Services in Delhi, India.
    SEO Service in Delhi

    ReplyDelete
  8. Appslure is a reputed company based in India which provide mobile app development company in mumbai. Our website's layout will be very attractive and responsive, which will gain more visitors and you can get high lead and business from your website. Wonderful post, This article have helped greatly continue writing ..
    Mobile app development company in mumbai

    ReplyDelete
  9. thank you for sharing this blog, it is very useful information for python learning.
    python course bangalore

    ReplyDelete
  10. MP Board 12th Class Blueprint 2021 English Medium & Hindi Medium PDF download, MPBSE 12th Blueprint 2021 Pdf Download, mpbse.nic.in 12th Blue Print, Marking Scheme and Arts, Commerce and Science Streams Chapter wise Weightage pdf download. MP Board 12th Blue Print || MPBSE 12th Model Papers || MPBSE 10th Model Papers

    Manabadi AP Intermediate 2nd Year Model Question Paper 2021 MPC, BIPC, CEC, MEC group TM, EM Subject wise Blue Print, Download BIEAP Intermediate Second Year Model Question Papers, AP Senior Inter Test Papers, Chapter wise important Questions download. || AP Inter MPC, Bi.PC, CEC Blue Print || AP Inter 1st / 2nd Year Model Papers || AP 2nd year inter Test Papers

    Kar 1st / 2nd PUC Blue Print || UP Board 12th Blueprint 2021

    ReplyDelete
  11. Excellent blog information shared was very informative and valuable looking forward for next blog thank you.
    Data Analytics Course Online 360DigiTMG

    ReplyDelete
  12. Awesome article with top quality information and I appreciate the writer's choice for choosing this excellent topic found valuable thank you.
    Data Science Training in Hyderabad

    ReplyDelete
  13. I have voiced some of the posts on your website now, and I really like your blogging style. I added it to my list of favorite blogging sites and will be back soon ...

    Business Analytics Course in Bangalore

    ReplyDelete
  14. Happy to chat on your blog, I feel like I can't wait to read more reliable posts and think we all want to thank many blog posts to share with us.

    Data Analytics Course in Bangalore

    ReplyDelete
  15. Very well written post. Thanks for sharing this, I really appreciate you taking the time to share with everyone. Best Pmp Certification

    ReplyDelete
  16. It’s interesting to read content nice post.
    Python Online Training

    ReplyDelete
  17. Nice Information Your first-class knowledge of this great job can become a suitable foundation for these people. I did some research on the subject and found that almost everyone will agree with your blog.
    Cyber Security Course in Bangalore

    ReplyDelete
  18. Great article with valuable information found very resourceful and enjoyed reading it waiting for next blog updated thanks for sharing.
    typeerror nonetype object is not subscriptable

    ReplyDelete
  19. Writing in style and getting good compliments on the article is hard enough, to be honest, but you did it so calmly and with such a great feeling and got the job done. This item is owned with style and I give it a nice compliment. Better!
    Cyber Security Training in Bangalore

    ReplyDelete
  20. I'm glad I found this blog! Occasionally, students want to know the keys to writing productive literary essays. Your first-class knowledge of this great job can become a suitable foundation for these people. PMP Training in Hyderabad

    ReplyDelete
  21. Incredibly all around intriguing post. I was searching for such a data and completely appreciated inspecting this one. Continue posting. A commitment of gratefulness is all together for sharing.data science course in Hyderabad

    ReplyDelete
  22. I will very much appreciate the writer's choice for choosing this excellent article suitable for my topic. Here is a detailed description of the topic of the article that helped me the most.
    unindent does not match any outer indentation level

    ReplyDelete
  23. I'm glad I found this blog! Occasionally, students want to know the keys to writing productive literary essays. Your first-class knowledge of this great job can become a suitable foundation for these people. Good
    unindent does not match any outer indentation level python

    ReplyDelete
  24. Incredibly in general very intriguing post. I was looking for such an information and took pleasure in scrutinizing this one. Keep posting. An obligation of appreciation is all together for sharing.data analytics course in Hyderabad

    ReplyDelete
  25. wonderful article contains lot of valuable information. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    This article resolved my all queries.good luck an best wishes to the team members.learn digital marketing use these following link
    Digital Marketing Course in Chennai

    ReplyDelete
  26. Thankyou for posting this informative blog, i come to know something new with this. Great Job! Keep it up.

    1000 free youtube subscribers

    ReplyDelete
  27. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    data science training

    ReplyDelete
  28. I have to search sites with relevant information ,This is a
    wonderful blog,These type of blog keeps the users interest in
    the website, i am impressed. thank you.
    Data Science Training in Bangalore

    ReplyDelete
  29. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    Data Science Course in Bangalore

    ReplyDelete
  30. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
    data analytics course in bangalore

    ReplyDelete
  31. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    Data Science Training in Bangalore

    ReplyDelete
  32. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    artificial intelligence course in pune

    ReplyDelete
  33. I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, hope you will provide more information on these topics in your next articles.
    data analytics training in bangalore

    ReplyDelete
  34. I am glad to discover this page. I have to thank you for the time I spent on this especially great reading !! I really liked each part and also bookmarked you for new information on your site.
    Data Science Training in Chennai

    ReplyDelete
  35. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
    artificial intelligence course in pune

    ReplyDelete
  36. I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
    best data science courses in hyderabad

    ReplyDelete
  37. I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
    data science in bangalore

    ReplyDelete
  38. i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
    data scientist course in bangalore

    ReplyDelete
  39. I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging endeavours.
    data science course in bangalore with placement

    ReplyDelete
  40. Thanks for posting the best information and the blog is very helpful.data science institutes in hyderabad

    ReplyDelete
  41. Thanks for spending all your pleasant time to make such a Creative content for us. AWS Training in Chennai

    ReplyDelete
  42. Thanks for spending all your pleasant time to make such a Creative content for us. AWS Training in Chennai

    ReplyDelete
  43. i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
    best data science courses in bangalore

    ReplyDelete
  44. I am sure that this is going to help a lot of individuals. Keep up the good work. It is highly convincing and I enjoyed going through the entire blog.

    business analytics course

    ReplyDelete
  45. great article!! sharing these type of articles is the nice one and i hope you will share an article on data science.By giving a institute like 360DigiTMG.it is one the best institute for doing certified courses
    data science in malaysia

    ReplyDelete