Let's Talk About Strings
It seems to me that the recent debate on immutable strings is misplaced. It seems related to the fact that:
- strings are not duplicated for efficiency reasons.
- since strings are mutable, it creates more aliasing than what we would care to have.
I have seen mentioned nowhere that in very few occasions we are interested in a string's identify i.e. we want to share it to allow an elegant change propagation using aliasing.
A debate similar in content seamed to have taken place when Eiffel Software decided to make
expanded deprecated as an entity attribute. The point was that making a type of objects expanded was an important design decision and, therefore, it should be used in the body of the class affected.
I have not seen the terminology used anywhere else but I'd dare call expanded classes pure abstract data type. The reason for this is that whether we implement commands as functions or as procedures with those does not change the semantic since they don't have any aliasing properties.
By opposition, I would put a greater emphasis on the abstract machine nature of the other classes. In those cases, object identity has a most important role to play.
Coming back to strings, I think the nature of strings as a pure abstract data type is predominant compared to its role as a machine since its identity is rarely needed. I would go further and add that it is so rare that we care about a strings identity that we could create a class
STRING_XX_REF and we would see it appear as rarely as
The aim of immutable strings, if I get it right, is to get rid of the devious effect of aliasing. I think it's greater quality is its efficiency and I don't think it is sufficient to support its wide adoption.
An interesting example that shows some problems that exists with strings and that would not disappear with the advent of immutable strings is the case of collections of strings. Systematically, when someone wants to create a collection of strings, he has to develop the reflex of setting the collection with object semantics. In other words, most of the time, the default value for collection of strings is wrong.
To address the efficiency issue, I think it is possible to limit the copy of expanded strings by copying only the reference of the SPECIAL implementing the string whenever a it is copied. Also, a flag can be kept to tell whether or not the string has been copied and to ensure that the
SPECIAL will be duplicated before write operations if it is shared.
The downside of this precise implementation is that it is pessimistic and a strings implementation can be duplicated even if it is no longer shared because the `shared' flag is not shared. This can be seen for example if a string is copied and both copies are modified. In that case, the representation will be duplicated twice instead of once. As far as I know, it's the best we can do without changing the runtime to allow reference counters.
In summary, what I reproach the immutable string solution is that it only partially solves the problem caused by mutable and aliasable strings. Furthermore, it splits an abstraction in several classes which would benefit from a consolidation instead.