Thursday, May 5, 2016

String optimization in Python

Strings are terribly important in programming. A program without some form of string input, manipulation, and output is a rarity.

Of course this means that speed and sanity surrounding string features is important. One important feature of Python is string immutability. This opens up dozens of features, such as using strings as dictionary keys, but there are some downsides.

Immutable strings means that any string manipulation, such as splitting or appending, is making a copy of that string. This can become a performance problem, especially in a world where zero-copy is one of the favorite general optimization techniques. If you've done enough string mutation, you're probably aware of the following techniques:
But in some cases Python uses the immutability to avoid making copies:
>>> a = 'a' * 1024 * 1024  # a 1 megabyte string
>>> z = '' + a
>>> z is a
True
Here, because adding an empty string does not change the value, z is the same exact string object as a. And it doesn't matter how many times you append an empty string:
>>> z = '' + '' + '' + a
>>> z is a
True
It even works when a is the only item in a list:
>>> z = ''.join([a])
>>> z is a
True
But it falls apart when you put an empty string in the list with a:
>>> z = ''.join(['', a])
>>> z is a
False
And unfortunately even the first example seems to make a copy on PyPy:
>>>> a = 'a' * 1024 * 1024  # a 1 megabyte string again
>>>> z = '' + a
>>>> z is a 
False 
Although something more advanced may be going on under the covers, as is often the case with PyPy.

I'm almost done stringing you along, but as a corollary reminder:
Never rely on "is" checks with ints, floats, and strings. "==" and other value checks are what you need. As a general rule, "is" is for objects, None, and sometimes True/False.

Keep on stringifying!

Mahmoud
http://sedimental.org/
https://github.com/mahmoud
https://twitter.com/mhashemi

13 comments:

  1. I think the cpython behaviour is an optimisation possible because of reference counting. Cpython can tell when adding strings if it's the only reference, and can reuse the memory rather than copying in cases like above. Pypy doesn't use reference counting, so can't do the same trick, aiui

    ReplyDelete
    Replies
    1. While it's true that PyPy does not use CPython-style reference counting, I don't think that implies that PyPy can't have this optimization, per se. PyPy still has immutable strings, as that's a property of the Python language, not the runtime.

      Delete
  2. This could be related to interned strings: you can force any string to be unique and in a global table using the intern() function.

    ReplyDelete
    Replies
    1. You are correct about intern() and that may be the culprit with shorter strings, but you'll notice I created a 1MB string to start with. Very much hope our relatively lightweight CPython doesn't magically intern that :)

      Delete
  3. Thanks for sharing the information about the Python and keep updating us.This information is really useful to me.

    ReplyDelete
  4. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    python training in chennai

    ReplyDelete


  5. Australia Best Tutor is offering assignment help Services for all universities learners or students at an affordable price. Here All students are joining for best grades and better results.

    Visit Here

    Australia Best Tutor
    Sydney, NSW, Australia
    Call @ +61-730-407-305
    Live Chat @ https://www.australiabesttutor.com

    Our Services

    Online assignment help
    my assignment help
    assignment help
    help with assignment
    instant assignment help
    Assignment help Services

    ReplyDelete
  6. CIITN Noida provides Best Big Data Training Institute in Noida & big Data Hadoop Training Institute in Noida. based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs. CIITN Provides Best Big Data Training in Noida. CIITN is one of the most credible Big Data training institutes in Noida offering hands on practical knowledge and full job assistance with basic as well as advanced level Big Data training courses. At CIITN Big Data training in noida is conducted by subject specialist corporate professionals with 7+ years of experience in managing real-time Big Data projects. CIITN implements a blend of academic learning and practical sessions to give the student optimum exposure that aids in the transformation of naïve students into thorough professionals that are easily recruited within the industry.
    CIITN is the best Hadoop training center in Noida with a very high level infrastructure and laboratory facility. The most attractive thing is that candidates can opt multiple Institute


    Best Big Data Training Institute in Noida

    ReplyDelete
  7. CIITN is the best SAP training and placement Institute in noida. Contestant will analyze in keeping with modern-day enterprise trends by way of certified and industry experts & SAP training Institute in Noida. They provide a 100% placement help with the best training of IT cousrses.

    If you are looking best sap training in Noida then you can join CIITN. One of the premier training and development institute for SAP Training Program in Noida. The company provides 6 weeks/ 6 Months Training for SAP. They provide training in all SAP Modules like SAP HANA,SAP B1, SAP FICO etc. Just come for free demo class and then join SAP and move up in your career .

    SAP Training Institute in Noida
    sap training in noida
    sap institute in noida
    sap course in noida
    best sap training institute in noida

    ReplyDelete
  8. CIITN provides Best Linux training in noida based on current industry standards that helps attendees to secure placements in their dream jobs at MNCs. CIITN Provides Best Linux Training in Noida. CIITN is one of the most credible offering hands on practical knowledge and full job assistance with basic as well as advanced level.

    best linux training institute in noida

    linux training in noida

    ReplyDelete
  9. I can only express a word of thanks! Nothing else. Because your topic is nice, you can add knowledge. Thank you very much for sharing this information.

    Avriq India
    avriq
    pest control
    cctv camera

    ReplyDelete