Thursday, May 5, 2016

String optimization in Python

Strings are terribly important in programming. A program without some form of string input, manipulation, and output is a rarity.

Of course this means that speed and sanity surrounding string features is important. One important feature of Python is string immutability. This opens up dozens of features, such as using strings as dictionary keys, but there are some downsides.

Immutable strings means that any string manipulation, such as splitting or appending, is making a copy of that string. This can become a performance problem, especially in a world where zero-copy is one of the favorite general optimization techniques. If you've done enough string mutation, you're probably aware of the following techniques:
But in some cases Python uses the immutability to avoid making copies:
>>> a = 'a' * 1024 * 1024  # a 1 megabyte string
>>> z = '' + a
>>> z is a
True
Here, because adding an empty string does not change the value, z is the same exact string object as a. And it doesn't matter how many times you append an empty string:
>>> z = '' + '' + '' + a
>>> z is a
True
It even works when a is the only item in a list:
>>> z = ''.join([a])
>>> z is a
True
But it falls apart when you put an empty string in the list with a:
>>> z = ''.join(['', a])
>>> z is a
False
And unfortunately even the first example seems to make a copy on PyPy:
>>>> a = 'a' * 1024 * 1024  # a 1 megabyte string again
>>>> z = '' + a
>>>> z is a 
False 
Although something more advanced may be going on under the covers, as is often the case with PyPy.

I'm almost done stringing you along, but as a corollary reminder:
Never rely on "is" checks with ints, floats, and strings. "==" and other value checks are what you need. As a general rule, "is" is for objects, None, and sometimes True/False.

Keep on stringifying!

Mahmoud
http://sedimental.org/
https://github.com/mahmoud
https://twitter.com/mhashemi

67 comments:

  1. I think the cpython behaviour is an optimisation possible because of reference counting. Cpython can tell when adding strings if it's the only reference, and can reuse the memory rather than copying in cases like above. Pypy doesn't use reference counting, so can't do the same trick, aiui

    ReplyDelete
    Replies
    1. While it's true that PyPy does not use CPython-style reference counting, I don't think that implies that PyPy can't have this optimization, per se. PyPy still has immutable strings, as that's a property of the Python language, not the runtime.

      Delete
  2. This could be related to interned strings: you can force any string to be unique and in a global table using the intern() function.

    ReplyDelete
    Replies
    1. You are correct about intern() and that may be the culprit with shorter strings, but you'll notice I created a 1MB string to start with. Very much hope our relatively lightweight CPython doesn't magically intern that :)

      Delete
  3. I should thank you for the undertakings you have made in making this article. I am confiding in a similar best work from you later on as well.. Enterprise SEO Services

    ReplyDelete
  4. currently trending technologies are phyton , azure . learn azure through azure training

    ReplyDelete
  5. Just like the annual SEO returns, SEO income per customer also varies. off page seo

    ReplyDelete
  6. I learned World's Trending Technology from certified experts for free of cost. I got a job in decent Top MNC Company with handsome 14 LPA salary, I have learned the World's Trending Technology from python training in btm layout experts who know advanced concepts which can help to solve any type of Real-time issues in the field of Python. Really worth trying Freelance SEO expert in Bangalore

    ReplyDelete
  7. Thank you so much for this useful article. Visit OGEN Infosystem for Web Designing and SEO Services in Delhi, India.
    SEO Service in Delhi

    ReplyDelete
  8. Appslure is a reputed company based in India which provide mobile app development company in mumbai. Our website's layout will be very attractive and responsive, which will gain more visitors and you can get high lead and business from your website. Wonderful post, This article have helped greatly continue writing ..
    Mobile app development company in mumbai

    ReplyDelete
  9. thank you for sharing this blog, it is very useful information for python learning.
    python course bangalore

    ReplyDelete
  10. MP Board 12th Class Blueprint 2021 English Medium & Hindi Medium PDF download, MPBSE 12th Blueprint 2021 Pdf Download, mpbse.nic.in 12th Blue Print, Marking Scheme and Arts, Commerce and Science Streams Chapter wise Weightage pdf download. MP Board 12th Blue Print || MPBSE 12th Model Papers || MPBSE 10th Model Papers

    Manabadi AP Intermediate 2nd Year Model Question Paper 2021 MPC, BIPC, CEC, MEC group TM, EM Subject wise Blue Print, Download BIEAP Intermediate Second Year Model Question Papers, AP Senior Inter Test Papers, Chapter wise important Questions download. || AP Inter MPC, Bi.PC, CEC Blue Print || AP Inter 1st / 2nd Year Model Papers || AP 2nd year inter Test Papers

    Kar 1st / 2nd PUC Blue Print || UP Board 12th Blueprint 2021

    ReplyDelete
  11. Excellent blog information shared was very informative and valuable looking forward for next blog thank you.
    Data Analytics Course Online 360DigiTMG

    ReplyDelete
  12. Awesome article with top quality information and I appreciate the writer's choice for choosing this excellent topic found valuable thank you.
    Data Science Training in Hyderabad

    ReplyDelete
  13. I have voiced some of the posts on your website now, and I really like your blogging style. I added it to my list of favorite blogging sites and will be back soon ...

    Business Analytics Course in Bangalore

    ReplyDelete
  14. Happy to chat on your blog, I feel like I can't wait to read more reliable posts and think we all want to thank many blog posts to share with us.

    Data Analytics Course in Bangalore

    ReplyDelete
  15. Very well written post. Thanks for sharing this, I really appreciate you taking the time to share with everyone. Best Pmp Certification

    ReplyDelete
  16. It’s interesting to read content nice post.
    Python Online Training

    ReplyDelete
  17. Nice Information Your first-class knowledge of this great job can become a suitable foundation for these people. I did some research on the subject and found that almost everyone will agree with your blog.
    Cyber Security Course in Bangalore

    ReplyDelete
  18. Great article with valuable information found very resourceful and enjoyed reading it waiting for next blog updated thanks for sharing.
    typeerror nonetype object is not subscriptable

    ReplyDelete
  19. Writing in style and getting good compliments on the article is hard enough, to be honest, but you did it so calmly and with such a great feeling and got the job done. This item is owned with style and I give it a nice compliment. Better!
    Cyber Security Training in Bangalore

    ReplyDelete
  20. I'm glad I found this blog! Occasionally, students want to know the keys to writing productive literary essays. Your first-class knowledge of this great job can become a suitable foundation for these people. PMP Training in Hyderabad

    ReplyDelete
  21. Incredibly all around intriguing post. I was searching for such a data and completely appreciated inspecting this one. Continue posting. A commitment of gratefulness is all together for sharing.data science course in Hyderabad

    ReplyDelete
  22. I will very much appreciate the writer's choice for choosing this excellent article suitable for my topic. Here is a detailed description of the topic of the article that helped me the most.
    unindent does not match any outer indentation level

    ReplyDelete
  23. I'm glad I found this blog! Occasionally, students want to know the keys to writing productive literary essays. Your first-class knowledge of this great job can become a suitable foundation for these people. Good
    unindent does not match any outer indentation level python

    ReplyDelete
  24. Incredibly in general very intriguing post. I was looking for such an information and took pleasure in scrutinizing this one. Keep posting. An obligation of appreciation is all together for sharing.data analytics course in Hyderabad

    ReplyDelete