Taking apart StringBuilder (Java)

If you are a Java developer, you almost certainly know that it is better to use a string builder than to add strings together. You may also know that you should use StringBuilder rather than StringBuffer because the latter is thread-safe which means it has a wasteful overhead if you don’t need it (and you likely don’t).

But how exactly is StringBuilder so much more better, nay, awesome than just adding strings together? Well remember that in Java strings are immutable: once a string has been created it cannot be changed. This has all sorts of benefits but it also mean that we have to create new strings if we want new values.

Lets back up a second, what exactly does Java do when you write some code like this:

 String message = "Greetings " + title + " " + firstName + " " + lastName; 

The answer? It creates and throws a way an entire bunch of strings! First it creates a temporary variable that contains the result of "Greetings " + title, (it also creates a temporary variable which contains “greetings”, but Java will have that optimised so creating it is cheap and there is no way around it anyhow), then it creates a new temporary variable that contains the result of adding the previous temporary variable and the space. At this point the first temporary variable is garbage and will need to be removed. With the second temporary variable on the call stack, Java creates a third temporary variable, sets it to the result of adding the second temporary variable and the firstName variable, etc.

In all, not counting the strings in the code, Java creates 5 strings but all but the last are garbage right away — assuming that this isn’t inside a loop, in which case it will create 5 garbage strings for each iteration in the loop. This can be done better and this is what StringBuilder does.

You can find the Java Doc here and the source code here — it might help you to have the source code open in new tab (or a new window). Since much of what we will be looking at is the AbstractStringBuilder class (source code here, I suggest opening that one as well).

Right then, lets rewrite the previous example to use a StringBuilder:

 StringBuilder strBuilder = new StringBuilder(); strBuilder.append("Greetings ").append(title).append(" ") .append(firstName).append(" ").append(lastName); 

Most usages of the StringBuilder uses just two methods: append and toString. So lets look at the append(String str) method:

 public AbstractStringBuilder append(String str) { if (str == null) str = "null"; int len = str.length(); if (len == 0) return this; int newCount = count + len; if (newCount > value.length) expandCapacity(newCount); str.getChars(0, len, value, count); count = newCount; return this; } 

If you have read my entry on ArrayLists nothing here should be surprising: StringBuilder is basically backed by a big char array that it then copies the chars of the string into.

The expandCapacity method? It (surprise) implements the logic to expand the backing store and serves as an illustration of why you shouldn’t reimplement something like that if you don’t have to:

 void expandCapacity(int minimumCapacity) { int newCapacity = (value.length + 1) \* 2; if (newCapacity \< 0) { newCapacity = Integer.MAX\_VALUE; } else if (minimumCapacity \> newCapacity) { newCapacity = minimumCapacity; } value = Arrays.copyOf(value, newCapacity); } 

Here the size of the new capacity will always be more than all its previous capacities which can cause stack fragmentation: if you read my article on the ArrayList you can see the details, the important part is that this wouldn’t have been an issue if the implementation had used ArrayList. Unfortunately that isn’t possible because char is a primitive value in Java and storing those would cause needless boxing issues.

Okay enough digression, how is this more efficient than allocating a ton of strings? Simple, assuming there is enough space in the backing store, we can add characters without creating any new objects which means we will also avoid causing any garbage.

StringBuilder has append methods for just about anything, but they all work the same way: get the array of chars that corresponds to the string representation of the object we should add and put it on the end of an array. StringBuilder also has insert methods, but those aren’t used as often so we will pretend they don’t exist.

Okay so now we have added all the strings, how do we get them back out?

 public String toString() { // Create a copy, don't share the array return new String(value, 0, count); } 

Here we create one String object using a less common constructor.

And that is pretty much it actually. A relatively simple implementation which gives a huge speed bonus.