eiffelroom

blogRead-only strings

colin-adams's picture
in

Why aren't literal strings in Eiffel read-only? Indeed, why isn't class STRING read-only? I think (as do others) that Java gets in right in this respect, with it's separation of String and StringBuffer.

All sorts of hard-to-debug problems can arise because there is no read-only variant of STRING. We had one of these occur in the Gobo XML library - the XML parser was corrupted because one of the event filters was editing one of the strings emitted by the parser.

Our work-around for this problem was the following class:

indexing
    description: "STRINGs with copy-on-write semantics"
    library: "Gobo Eiffel String Library"
    copyright: "Copyright (c) 2005, Colin Adams and others"
    license: "MIT License"
    date: "$Date: 2007-01-26 18:55:25 +0000 (Fri, 26 Jan 2007) $"
    revision: "$Revision: 5877 $"

class interface
    ST_COPY_ON_WRITE_STRING

create
    make

feature -- Access

    item: STRING_8
            -- String

    safe_item: STRING_8
            -- Version of item that is safe for editing
        ensure
            safe_to_edit: changed
            same_as_item: Result /= Void and then Result = item
   
feature -- Element change

    append_character (c: CHARACTER_8)
            -- Append `c' at end.
        ensure
            new_count: item.count = old item.count + 1
            appended: item.item (item.count) = c
            safe_to_edit: changed

    append_string (s: STRING_8)
            -- Append a copy of `s' at end.
        require
            s_not_void: s /= Void
        ensure
            safe_to_edit: changed

    fill_with (c: CHARACTER_8)
            -- Replace every character with `c'.
        ensure
            same_count: old item.count = item.count
            filled: item.occurrences (c) = item.count
            safe_to_edit: changed

    insert_character (c: CHARACTER_8; i: INTEGER_32)
            -- Insert `c' at index `i', shifting characters between
            -- ranks `i' and `count' rightwards.
        require
            valid_insertion_index: 1 <= i and i <= item.count + 1
        ensure
            one_more_character: item.count = old item.count + 1
            inserted: item.item (i) = c
            safe_to_edit: changed

    put (c: CHARACTER_8; i: INTEGER_32)
            -- Replace character at index `i' by `c'
        require
            valid_index: item.valid_index (i)
        ensure
            stable_count: item.count = old item.count
            replaced: item.item (i) = c
            safe_to_edit: changed
   
invariant
    item_not_void: item /= Void

end -- class ST_COPY_ON_WRITE_STRING

Hm. Now I look at it, it seems that the contract for append_string could be strengthened in the postconditions.

The _8 suffixes are only present because I used the EiffelStudio interface view.

This class can be used in a lot of situations to avoid problems. The basic idea is to avoid duplicating strings needlessly.

But it would be much better if we could follow the Java line here.

I think it ought to be possible to do this without breaking (much) existing code, by making use of the convert keyword. Rather than having STRING_GENERAL inherit from READ_ONLY_STRING_GENERAL, we have parallel hierarchies, and say that the read-only versions can convert to the existing versions (the creation procedures involved would of course copy the characters of the string).

Then we could change string literals to be of type READ_ONLY_STRING (or rather, one of its aliases), and everything should work just fine.

peter_gummer's picture

Mutable strings are evil

I, too, often encounter hard-to-track-down bugs in Eiffel code caused by the fact that STRING is mutable. Every other OO language that I've worked with (Delphi, C#, Java) treats strings as immutable. This is a bit weird, because it means that strings are reference types with some expanded semantics; so the Eiffel approach is more consistent with the rest of the type system. But having a few years of Eiffel development under my belt now, I can safely say that it's not just a prejudice based on what I'm used to: mutable strings are evil!

colin-adams's picture

CONSTANT_STRING

CONSTANT_STRING_GENERAL, CONSTANT_STRING_8 and CONSTANT_STRING_32 look like better names. Colin Adams

colin-adams's picture

Performance of string comparisons

The disadvantage of using convert is that the cost of string comparisons, already an expensive operation, is increased by the need to create a temporary object. Colin Adams

paulbates's picture

I agree

Immutable STRING variants are something that I've brought up a number of times. I fully agree that Eiffel needs immutable strings. Keep up the comments.

colin-adams's picture

Questionnaire

It might be worth running a poll on this. Options such as:

  1. Keep the status Quo
  2. Have STRING_GENERAL inherit CONSTANT_STRING_GENERAL (breaking all existing Eiffel code)
  3. as my suggestion, despite the performance implications

Colin Adams

Inherit and convert

Can't we have both 2) and 3)? Mutable strings would conform to constant strings and constant strings would convert to mutable strings. Comparison would use constant strings as argument and hence conformance would be involved (no performance implications). The problem with STRING_GENERAL inheriting from CONSTANT_STRING_GENERAL is that even though it's a good way for a feature to state that it won't alter strings passed as arguments if declared as constant, it does not mean that this string will not be modified by another feature (or the same feature after assignment attempt) if its dynamic type is in fact one of a mutable string. So we think that the string passed as CONSTANT_GENERAL_STRING will not change, but it can. In fact it's not surprising. Even if the inheritance is appealing, a mutable string is not a constant string. It's like having RECTANGLE inherit from SQUARE. Hmmm, so I guess that in order to be 100% safe we need conversion in both ways. For comparison and performance, we probably need a common ancestor to STRING_GENERAL and CONSTANT_STRING_GENERAL. READONLY_STRING_GENERAL? We can only read the content of the string through this interface, but it's not necessarily a constant string. Polymorphically it can be attached to a mutable string whose content can be modified.

colin-adams's picture

Comparison

I think READABLE_STRING is better than READONLY_STRING_GENERAL (because the dynamic type may also be writable, and I think we will not need _8 and _32 descendants for this class).

For comparison, there is same_string and is_equal. Same_string can accept a READABLE_STRING, and so avoid a conversion, but what about is_equal? It takes a like Current argument, and so a conversion will be involved. Colin Adams

paulbates's picture

Taking the discussion elsewhere.

I'm in the process of creating a Wiki page with rationale and implementation suggestions. I'll post a link when it in at least a legible draft state.

paulbates's picture

Done...

Wiki document on Immutable Strings.

mtn's picture

I think the signature of

I think the signature of is_equal is subject to change anyway.

like Current will be replaced with ANY IIRC.

-- mTn-_-|

about - contact