Austin Z. Henley

I work on software.


Home | Publications | Blog

Python strings are immutable, but only sometimes

2/15/2021

Update 2/16/2021: See the discussion of this post on Hacker News.

The standard wisdom is that Python strings are immutable. You can't change a string's value, only the reference to the string. Like so:

x = "hello"
x = "goodbye"  # New string!

Which implies that each time you make a change to a string variable, you are actually producing a brand new string. Because of this, tutorials out there warn you to avoid string concatenation inside a loop and advise using join instead for performance reasons. Even the official documentation says so!

This is wrong. Sort of.

There is a common case for when strings in Python are actually mutable. I will show you an example by inspecting the string object's unique ID using the builtin id() function, which is just the memory address. The number is different for each object. (Objects can be shared though, such as with interning.)

I concatenated two strings but the memory address did not change!

For an extra sanity check, let's make our own "pointer" and see if it points to the original or modified string.

If strings were truly immutable, then the address we stored in b would point to the original string. However, we see that the strings printed are equivalent.

We can try another test to see how often we get a new string object by doing 10,000 small concatenations.

Only 46 of the 10,000 concatenations allocated a new string!

So what is going on here?

CPython is clever. If there are no other references to the string, then it will attempt to mutate the string instead of allocating a new one. Though it will sometimes need to resize the buffer if the string grows too big, much like C++'s vector or C#'s List.

We can see the first clue in the code for this in CPython's Python/ceval.c file. It tries to reduce the reference count to 1 if possible when doing a string concatenation.

Now take a look at PyUnicode_Append() in Objects/unicodeobject.c. If the string is modifiable (has only 1 reference and is not interned), then it will try to append in place! unicode_resize() will see if it will fit or if a new allocation needs to be made.

There we have it. Evidence that you can mutate a string in Python.


Does this really count as mutability though? Not really. The string would be thrown away anyway so this is just an optimization to reuse the memory. The important takeaway here is that you are not allocating a new string every single time like the internet says.