<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.eiffelroom.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>eiffelroom - Unicode - Comments</title>
 <link>http://www.eiffelroom.com/tag/unicode</link>
 <description>Comments for &quot;Unicode&quot;</description>
 <language>en</language>
<item>
 <title>Relevance</title>
 <link>http://www.eiffelroom.com/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-173</link>
 <description>&lt;p&gt;The main benefit for homogeneous clusters is simpler heuristics - there is no possibility of confusing Latin-1 with UTF-8.&lt;/p&gt;

&lt;p&gt;If you can&#039;t eliminate the possibility of one or another, I know of know way to disambiguate them, in general.&lt;/p&gt;

&lt;p&gt;Of course, there are lots of cases where it is easier to see which of the two is meant. But in other cases, not.&lt;/p&gt;

&lt;p&gt;So, starting from the case where the file is pure ASCII. Did the author intend it to be treated as Latin-1 or UTF-8?&lt;/p&gt;

&lt;p&gt;In this case, it doesn&#039;t matter (the only possibility is the type of manifest string constants, but these are defined to be of type STRING).&lt;/p&gt;

&lt;p&gt;But all we have to do is to mutate one character in a string literal, and immediately (if we choose the mutation carefully), the case becomes undecidable.&lt;/p&gt;

&lt;p&gt;Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sun, 01 Apr 2007 10:29:33 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 173 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Is this relevant?</title>
 <link>http://www.eiffelroom.com/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-165</link>
 <description>&lt;p&gt;The source code will be using plain text file (i.e. sequence of character codes that are between 0 and 255), UTF-8 or any other Unicode encoding. Once you have the encoding then the semantics is properly defined.&lt;/p&gt;

&lt;p&gt;Of course if one library author is using Unicode characters beyond 255, the user of that library will be forced to use a Unicode encoding for his source code, but is this relevant to the project specification? I don&#039;t think so.&lt;/p&gt;

</description>
 <pubDate>Fri, 30 Mar 2007 10:12:00 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 165 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Heuristics</title>
 <link>http://www.eiffelroom.com/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-164</link>
 <description>&lt;p&gt;See also &lt;a href=&quot;http://eiffelsoftware.origo.ethz.ch/index.php/Heuristics_for_detecting_class_text_encoding&quot;&gt;Heuristics for detecting class text encoding&lt;/a&gt;. Colin Adams&lt;/p&gt;

</description>
 <pubDate>Fri, 30 Mar 2007 10:07:50 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 164 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>I believe it does not matter</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-147</link>
 <description>&lt;p&gt;I believe it does not matter whether or not those routines are in STRING_GENERAL. It might be better to have them outside, possibly that you may want to serialize the data in something else than a string and to reduce code duplication it makes more sense outside.&lt;/p&gt;

&lt;p&gt;For {STRING_32}.out, I don&#039;t think it is a major issue. The default implementation of `out&#039; is compiler defined at the moment to be STRING. In the future we may want to change this to be STRING_32, but for the time being, being a truncated version of the STRING_32 representation is fine to me since `out&#039; has different semantics depending on the Eiffel class. In my opinion using `out&#039; for encoding would be really wrong.&lt;/p&gt;

</description>
 <pubDate>Mon, 19 Mar 2007 14:17:00 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 147 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Serializing</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-146</link>
 <description>&lt;p&gt;I agree that the encoding form doesn&#039;t matter, and that the compiler should be free to choose whichever it likes (and UTF-32 would be my choice too).&lt;/p&gt;

&lt;p&gt;But for serializing, STRING_GENERAL should have the following routines (bodies omitted):&lt;/p&gt;

&lt;p&gt;&lt;div class=&quot;geshifilter eiffel&quot; style=&quot;font-family: monospace;&quot;&gt;to_utf8: !STRING_8 &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Serialization of `Current&#039; as bytes of UTF-8 representation.&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; not_shorter: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt; &amp;gt;= count&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
to_utf_16_be:&amp;nbsp; !STRING_8 &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Serialization of `Current&#039; as bytes of UTF-16BE representation.&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; not_shorter: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt; &amp;gt;= &lt;span style=&quot;color: #FF0000;&quot;&gt;2&lt;/span&gt; * count&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
to_utf_16_le:&amp;nbsp; !STRING_8 &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Serialization of `Current&#039; as bytes of UTF-16LE representation.&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; not_shorter: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt; &amp;gt;= &lt;span style=&quot;color: #FF0000;&quot;&gt;2&lt;/span&gt; * count&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
to_utf_32_be:&amp;nbsp; !STRING_8 &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Serialization of `Current&#039; as bytes of UTF-32BE representation.&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; four_times_longer: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt; = &lt;span style=&quot;color: #FF0000;&quot;&gt;4&lt;/span&gt; * count&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
to_utf_32_le:&amp;nbsp; !STRING_8 &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Serialization of `Current&#039; as bytes of UTF-32LE representation.&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; four_times_longer: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt; = &lt;span style=&quot;color: #FF0000;&quot;&gt;4&lt;/span&gt; * count&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;/div&gt; The question remains what {STRING_32}.out should produce. Perhaps it should be platform-specific (and may be finer grained than just Windows v. POSIX - different Windows configurations may have different natural defaults - I&#039;m not sure about this).&lt;/p&gt;

&lt;p&gt;Colin Adams&lt;/p&gt;

</description>
 <pubDate>Mon, 19 Mar 2007 07:42:04 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 146 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>The form doesn&#039;t matter</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-145</link>
 <description>&lt;p&gt;The form doesn&#039;t matter since it is most likely hidden from the user point of view (the user manipulate a sequence of characters and nothing more). Nevertheless, it might be better if it is kept 32-bit since it is faster.&lt;/p&gt;

&lt;p&gt;Regarding the encoding scheme, it cannot be set on the application level since many libraries might choose a different encoding, or you might have a need to read different encoding. So it has to be configurable and this should be outside the STRING class.&lt;/p&gt;

</description>
 <pubDate>Sun, 18 Mar 2007 16:56:00 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 145 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Unicode encoding</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-136</link>
 <description>&lt;p&gt;There are two Unicode encodings to be considered - the Unicode Encoding Form, and the Unicode Encoding Scheme.&lt;/p&gt;

&lt;p&gt;The former is one of UTF-8, UTF-16, and UTF-32. This is what is used internally within the program. It is a classic time/space trade-off. ISE use UTF-32, which waste memory to speed computing time. Either UTF-16 or UTF-8 would be slower, but UTF-16 is rarely significantly slower. I don&#039;t see any linguistic cultural differences affecting the issue.&lt;/p&gt;

&lt;p&gt;The Unicode Encoding Schemes are byte serializations of the encoding forms. The full list is UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE. The natural default tends to map to your computer hardware + O/S, except there is a disk-space consideration here: The UTF-32* use more disk space. Which is the most economical DOES depend upon the linguistic culture - in Europe UTF-8 is cheapest, in East Asia, UTF-16* are cheapest.  I don&#039;t know what {STRING_32}.out produces with ISE 5.7.&lt;/p&gt;

&lt;p&gt;So there are two possible sets of set_unicode_encoding features.&lt;/p&gt;

&lt;p&gt;Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sun, 18 Mar 2007 01:30:00 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 136 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>STRING_8/STRING_32</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-135</link>
 <description>&lt;p&gt;I would also really like to know why we should have STRING_8 etc. It makes no sense to me at all. Currently the compiler is doing tricks to convert all STRINGs in our code to STRING_8, but certain things like checking generated_type.is_equal(&amp;quot;STRING&amp;quot;) break; anywhere where dynamic_type() from INTERNAL is used with STRING objects might or might not work. And I don&#039;t see why issues of UTF encoding (not the same as unicode per se) should be exposed at a developer-visible level.&lt;/p&gt;

&lt;p&gt;What we need is to be able to say at the beginning of an application set_unicode_encoding_utf8 or set_unicode_encoding_utf16 and everything just works. The default should be whichever makes sense in your linguistic culture (UTF-8 in all european languages).&lt;/p&gt;

&lt;p&gt;- thomas&lt;/p&gt;

</description>
 <pubDate>Sat, 17 Mar 2007 19:15:33 -0700</pubDate>
 <dc:creator>thomas.beale</dc:creator>
 <guid isPermaLink="false">comment 135 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Real issue</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-134</link>
 <description>&lt;p&gt;What I meant is that it is not an issue of encoding or character set, but an issue with memory representation of Eiffel strings. Indeed the existing legacy C code wrapping are using this directly. So changing STRING to support unicode will break those programs. This is one of the reason why EiffelBase introduced C_STRING so that it works regardless of the memory representation of Eiffel strings.&lt;/p&gt;

</description>
 <pubDate>Sat, 17 Mar 2007 09:27:28 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 134 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>My solution</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-133</link>
 <description>&lt;p&gt;Well, two queries actually. &lt;div class=&quot;geshifilter eiffel&quot; style=&quot;font-family: monospace;&quot;&gt;maximum_code: &lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+INTEGER&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;INTEGER&lt;/span&gt;&lt;/a&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Maximum value of `code&#039; permitted by `character_set&#039;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- 255 for ISO-8859-x, 1114111 for Unicode&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- compiler can optimize this as a builtin query&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- in the case that only 1 character set is used in&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- a system. Otherwise it would be an attribute&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; positive_code: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt; &amp;gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;0&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;and&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
character_set: &lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- Name of character set used in `Current&#039;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- same considerations as for `maximum_code&#039;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;ensure&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp;result_not_empty: &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt; /= &lt;span style=&quot;color: #800080;&quot;&gt;Void&lt;/span&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;and&lt;/span&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;then&lt;/span&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;not&lt;/span&gt; &lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;is_empty&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp;result_is_ascii: &lt;span style=&quot;color: #008000; font-style: italic;&quot;&gt;-- whatever &lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;With these two queries, all incompatibilities can be coped with (including your c-string stuff - the compiler can include transcoding  when necessary).&lt;/p&gt;

&lt;p&gt;Note that the latter query was also needed for ETL2, for supporting multiple or alternate encodings, such as ISO-8859-2.&lt;/p&gt;

</description>
 <pubDate>Sat, 17 Mar 2007 05:25:18 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 133 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Solution?</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-132</link>
 <description>&lt;p&gt;The issue is that you had legacy code wrapping C interfaces required 8-bit strings and having Unicode strings would have broken those API. Therefore the separation was and is still needed.&lt;/p&gt;

&lt;p&gt;What do you mean by a query?&lt;/p&gt;

</description>
 <pubDate>Sat, 17 Mar 2007 00:14:56 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 132 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Multiple string types</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-131</link>
 <description>&lt;p&gt;I don&#039;t think there was a backwards-compatibility problem - at least, not one that needed the STRING_*/STRING_32 separation as a solution. A much simpler fix of adding a query would have done the trick.&lt;/p&gt;

</description>
 <pubDate>Fri, 16 Mar 2007 23:24:55 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 131 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Flip of a coin</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-130</link>
 <description>&lt;p&gt;That would be soon possible when we have converted all our legacy code that only handle STRING_8 will be adapted to work with STRING_32 as well.&lt;/p&gt;

</description>
 <pubDate>Fri, 16 Mar 2007 22:19:00 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 130 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Limitations - Yes, UTF-8</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-128</link>
 <description>&lt;p&gt;Yes, the assumption here is that the strings are all UTF-8. (I tried to make that clear, especially in the title). This assumption is sound for our purposes.&lt;/p&gt;

&lt;p&gt;For this reason, the official EiffelStudio version of &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY&lt;/code&gt; probably should &lt;em&gt;not&lt;/em&gt; adopt my &amp;quot;fix&amp;quot;. Other good reasons for EiffelStudio to come up with a better fix than this are that my implementation is inefficient, and it would create a dependency of the &lt;code class=&quot;geshifilter&quot;&gt;base&lt;/code&gt; library on the &lt;code class=&quot;geshifilter&quot;&gt;gobo&lt;/code&gt; library. I don&#039;t mind my own project having a dependency on Gobo - the project already uses Gobo - but this is not ok in general.&lt;/p&gt;

&lt;p&gt;I agree with your idea of generating &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt; when reading the data. I was thinking along those lines when I attempted (unsuccessfully) to map &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; to &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt;. I modified my project&#039;s config to use &lt;code class=&quot;geshifilter&quot;&gt;base&lt;/code&gt; as a &lt;em&gt;cluster&lt;/em&gt; rather than a library; then I copied all of the mappings from &lt;code class=&quot;geshifilter&quot;&gt;base.ecf&lt;/code&gt; to my config, editing &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; to map it to &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt; rather than &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_8&lt;/code&gt;. But I quickly abandoned that route, because it wouldn&#039;t even compile. Some line in a library (something like &lt;code class=&quot;geshifilter eiffel&quot;&gt;true_string: &lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;is&lt;/span&gt; &lt;span style=&quot;color: #0080A0;&quot;&gt;&amp;quot;True&amp;quot;&lt;/span&gt;&lt;/code&gt;) couldn&#039;t convert a &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_8&lt;/code&gt; to a &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt;. I could have left the mapping alone, I suppose, and done a global search and replace of &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; with &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt; in our code; but that is very invasive, and I&#039;d be surprised if it worked given that the libraries we call would still be producing &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_8&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;I really don&#039;t like this &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_8&lt;/code&gt; / &lt;code class=&quot;geshifilter eiffel&quot;&gt;STRING_32&lt;/code&gt; idea. I programmed in C# for two years, developing an internationalised application, and I was barely conscious of the fact that my strings and characters were Unicode. It just worked. I acknowledge that Eiffel is contending with a backward-compatibility problem here, but I would be much happier if I could just flip a switch in the config file so that all of my &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; objects instantly became Unicode.&lt;/p&gt;

</description>
 <pubDate>Fri, 16 Mar 2007 15:41:35 -0700</pubDate>
 <dc:creator>peter_gummer</dc:creator>
 <guid isPermaLink="false">comment 128 at http://www.eiffelroom.com</guid>
</item>
<item>
 <title>Limitations</title>
 <link>http://www.eiffelroom.com/blog/peter_gummer/utf_8_unicode_in_eiffel_for_net#comment-126</link>
 <description>&lt;p&gt;As far as I can tell, this implies that all your Eiffel strings are UTF-8, as otherwise it might not work for characters that are above 128. But if you get your data from UTF-8, wouldn&#039;t it be better to generate STRING_32 instead when reading the data. Once done, the STRING_32 would convert nicely with .NET System.String.&lt;/p&gt;

</description>
 <pubDate>Fri, 16 Mar 2007 08:37:37 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 126 at http://www.eiffelroom.com</guid>
</item>
</channel>
</rss>
