The slow way using combine and strings
Building long strings in the Python progamming language can sometimes result in very slow running code. In this article I investigate the computational performance of various string concatenation methods.
In Python the string object is immutable - each time a string is assigned to a variable a new object is created in memory to represent the new value. This contrasts with languages like perl and basic, where a string variable can be modified in place.
The common operation of constructing a long string out of several short segments is not very efficient in Python if you use the obvious approach of appending new segments to the end of the existing string.
Each time you append to the end of a string, the Python interpreter must create a new string object and copy the contents of both the existing string and the appended string into it. As the strings you are manipulating become large this proces becomes increasingly slow. What other methods are available and how does their performance compare? I decided to test several different approaches to constructing very long strings to see how much they vary in efficiency.
For this comparison I required a test problem that calls for the construction of very long strings out of a large number of segments. It should not do much other computation, so that the measured performance is dependent only on the string operation performance. The test case I used is to concatenate a series of integers from zero up to some large number.
One can easily vary this number to vary the size of the produced strings. For example the first 20 integers would give us a string like this:. To me this is the most obvious approach to the problem. You can accomplish the same thing with the str function, but that ended up being somewhat slower, so I stuck with the backticks for all my methods.
As I mentioned, although this method is obvious it is not at all efficient. You can see in the results below that we ran a mere string operations per second. If you need to do lots of concatenations, this is not the the slow way using combine and strings way to go about it.
The python library includes a class called MutableString. According to the documentation the class is for educational purposes. One might think that an append operator on a mutable string would not reallocate or copy strings. In the test this method performed even worse than method 1. Examining the source code for UserString.
Concatenations using this class aren't going to be any faster than normal immutable string operations, and indeed the extra overhead of interpreting the MutableString class methods make this approach a good deal slower. I almost didn't try this method at all but I had seen it suggested in a mail list, so I decided to give it a whirl. The idea is to use an array of characters to store the string.
Arrays are mutable in python, so they can be modified in place without copying the existing array contents. In this case we're not interested in changing existing array elements. We just want to add new array elements at the end of the array. The fromstring call appends the string character by character into the existing array.
This approach is the slow way using combine and strings suggested as a very pythonic way to do string concatenation. First a list is built containing each of the component strings, then in a single join operation a string is constructed conatining all of the list elements appended together.
There's a funny looking python idiom on the last line - we call the join method of the object identified by the empty string. Not too many languages will let you call methods on a string literal.
If you the slow way using combine and strings that offensive, you can write instead: Obviously it's easy to append to a file - you simply write at the end of it and the same is true for this module. It the slow way using combine and strings be pretty speedy. Using this object we can build our string one write at a time and then collect the result using the getvalue call. Interestingly enough string objects in Java are immutable just like python.
In java there is a class called StringBuffer. This is a bit more powerful than either the python StringIO or the array approach, because it supports inserting and removing sub-strings as well as appending them. This method is the shortest. I'll spoil the surprise and tell you it's also the fastest. It's extremely compact, and also pretty understandable.
Create a list of numbers using a list comprehension and then join them all together. Couldn't be simpler than that. This is really just an abbreviated version of Method 4, and it consumes pretty much the same amount of memory. It's faster though because we don't have to call the list. I wanted to look at both the length of time taken to build the string and the amount of memory used by the Python interpreter during the computation.
Although memory is cheap, there are a couple of reasons why it can be an important factor. The python program may be running on a system that imposes fixed resource limits.
For example in a shared web hosting environment, the machine may the slow way using combine and strings configured to limit the memory size of each process. Typically the kernel will kill a process whose the slow way using combine and strings memory exceeds the quota. That would be annoying for a CGI script, and really unfortunate for a long-lived server process.
So in those cases keeping memory use from expanding unpredictably is important. The other reason is that when you're dealing with very large strings, having the interpreter's memory allocation grow too large could cause the virtual memory system to start paging the process out to disk.
Then performance will really go down hill. It doesn't matter if you find the fastest algorithm in the slow way using combine and strings world - if it uses too much memory it will run slow as a dog.
If the slow way using combine and strings use an algorithm that uses less memory, the chances of paging are reduced and we will have more predictable performance. I tried each method of the methods as a separate test using it's own python process. I ran these tests using Python 2. Next I tried a run of each method usingintegers concatenated into a string 2, kB long. This is a much more serious test and we start to see the size of the python interpreter process grow to accomodate the data structures used in the computation.
I didn't even bother to try run this test to completion using Methods 1 and 2. It would take many minutes to concatenate a half million integers using these methods.
That's not too surprising - the string representation of each integer is a little longer in this test - usually five digits instead of four. In the first test Method 3 performed ten times better than our first two methods, but it didn't scale that well on the longer test. It did however use less space than any of the other reasonable methods. Clearly python is doing a great job of storing the array efficiently and garbage collecting the temporary strings in this case.
The performance of Method 4 is more the slow way using combine and strings twenty times better than naive appending in the 20, test and it does pretty well also on thetest. Interestingly method 5 did better in the longer test. Method 6 is still the overall winner, but Method 5 is now doing more concatenations per second and has almost caught up with Method 6. We can guess that if we went to an even longer running test, Method 5 would surpass Method 6.
Notice also the differences in process sizes. At the end of the computation for Method 6 the interpreter is using 22,kB of memory, eight times the size of the string it is computing, whereas Methods 3 and 5 uses less than half that much. I would use Method 6 in most real programs. It's fast and it's easy to understand. It does require that you be able to write the slow way using combine and strings single expression that returns each of the values to append.
Sometimes that's just not convenient to do - for example when there are several different chunks of code that are generating output. In those cases you can pick between Method 4 and Method 5. Method 4 the slow way using combine and strings for flexibility. You can use all of the normal slice operations on your list of strings, for insertions, deletions and modifications.
The performance for appending is pretty decent. Method 5 wins out on efficiency. If you're doing a lot of string appending cStringIO is the way to go. Measuring the time taken to the slow way using combine and strings each method was relatively easy. I used the Python library timing module to measure elapsed time. I didn't attempt to measure the CPU time used by the Python process as opposed to other processes running on the machine, but the machine was idle at the time, so I don't think this would make much difference.
Measuring memory used was a little trickier. Python doesn't currently provide a way to monitor the size of allocated objects, so I instead used the Unix 'ps' command to monitor the allocated process size. Since process size varies during execution I wanted to measure the maximum allocated memory. To do that I ran the 'ps' process right as the computation finishes. The value 15 would probably the slow way using combine and strings to be changed for different versions of ps. I tried using range instead of xrange the slow way using combine and strings pre-calculate the list of numbers.
Somewhat surprisingly range ended up being slightly faster in every case. Armin Rigo has recently argued that xrange could be eliminated as a separate language construct if the interpreter were smart enough to return an object that uses the appropriate backing storage iterator or list depending on the context. I find this argument compelling from a language design perspective, although I have no idea how hard to implement such an optimization would be.
I'd love to do a comparison of other programming languages on this same task.
If you are coming to Go from another language like Ruby or Python there are a lot of changes to take in, and many of these revolve around the string type. Below is a list of some quick tips that answered questions The slow way using combine and strings had during my first few weeks using Golang.
Creating a multiline string in Go is actually incredibly easy. But be careful - any spacing you use in the string to retain indentation will also be present in the final string. While the code below still works, Go 1. It is much more efficient to use a bytes. Buffer and then convert it to a string once you have concatenated everything. You can also use the strings. Join function if you have all of the strings ahead of time. Unfortunately, if you try to do what seems obvious in Go, like casting an int to a string, you are unlikely to get the slow way using combine and strings you expected.
What would you expect the output of s the slow way using combine and strings be? If you guess "" like most people would, you would sadly be mistaken. Instead you should look to use packages like strconv or functions like fmt. For example, here is an example using strconv. Itoa to convert an integer into a string. You can also use the fmt. Sprintf function to convert pretty much any data type into a string, but this should generally be reserved for instances where you are actually creating strings with embedded data, not when you want to convert a single integer into a string.
Sprintf operates pretty much the slow way using combine and strings to fmt. Printf except instead of printing out the resulting string to standard output it instead returns it as a string. As I mentioned before, fmt. Sprintf should typically be reserved for creating strings with embedded values.
There are a few reasons for this, but the most prominent one is that fmt. Many languages like Ruby and Python provide some helpers that make generating a random string really easy, so surely Go has one, right? Go opted to instead only provide the tools to create random strings and left the details up to the developer.
While this might be a turn off at first, the upside is that you get to completely dictate how the string the slow way using combine and strings generated. This means you can dictate the character set, how your random generation is seeded, and any other pertient details. In short, you have more control but at the cost of needing to write a little extra code.
This happens because the Go Playground always uses the same time, so when we created our source with the rand. NewSource function and passed in the current time that value is always the same, so our generated strings will always be the same. There are likely more optimal solutions than this one for your particular needs, but it is a good starting point.
Regardless of what you end up using, this example should help get you started. Just be sure to remember to seed your random number generator! Seed function, or by creating a source. I opted to create a source in the example above.
When dealing with strings it is incredibly common to want to figure out if a string starts with or ends with a specific string. For functions that sounds like very common use cases, your best bet is often to head on over to the strings package and check for something that might help you out.
In this case you would want to use the functions strings. HasPrefix str, prefix and strings. You can see them in action below. If you are picking up Go after having experience with another language, one common mistake I see is developers spending too much time the slow way using combine and strings for packages that provide the functionality that they need when they could have easily just written the code themselves. There are definitely perks to using a standard library; Eg they are tested thoroughly and are well documented.
Despite those perks, if you find yourself spending more than a few minute looking for a function it is often just as beneficial to write it yourself. In Go you can convert a string into a byte slice byte and a byte slice into a string. Doing this is very easy and looks like any other type conversion. This conversion is often performed in order to pass a string into a function that accepts a byte slice, or to pass a byte slice into a function that expects a string. I hope you found these helpful and informative, and be sure to check out some of my courses like Gophercises info below if you want to practice your Go a bit more.
Gophercises is a FREE course where the slow way using combine and strings work on exercise problems that are each designed to teach you a different aspect of Go. This includes topics ranging from basic string manipulation all the way to more advanced topics like functional options and concurrency.
Each exercise has a sample solution, as well as a screencast video where I code the solution while walking you through the code. Go is an awesome language whether you are new to programming or have experience in other languages, but learning a new language can be a struggle without the right resources.
To help save you time and get you off to a great start, I have created a guide to learning Go that you can get for FREE! You will also receive notifications when I release new articles and courses that I think will help you our while learning Go. Jon Calhoun is a full stack web developer who also teaches about Go, web development, algorithms, and anything programming related. He also consults for other companies who have development needs.
If you need some development work done, get in touch! Jon is a co-founder of EasyPost, a shipping API that many fortune companies use to power their shipping infrastructure, and prior to founding EasyPost he worked at google as a software engineer. Sharing helps me continue to the slow way using combine and strings both free and premium Go resources. See something that is wrong, think this article could be improved, or just want to say thanks?
I'd love to hear what you have to say! You can reach me via email or via twitter. Learn Web Development with Go. Practice Go with Gophercises. Multiline strings Creating a multiline string in Go is actually incredibly easy.
Limit your Sprintf usage As I mentioned before, fmt. HasPrefix "something""some" fmt. Println b for i: Want to improve your Go skills? Just getting started with Go? Related articles Concatenating and Building Strings in Go 1. Let others know about it! Vote on Hacker News.
Press report: Scientific publication: Karanth, K.Nichols, J.Hines, J.Karanth, K.