String Equality and Interning

In the previous article on “Strings : Java and C#”, I had discussed the concept of strings, and some common functions for string manipulation. This time, I will be mainly talking on string equality, and about string interning.

So, Lets start with string equality first.. But before starting, keep in mind.. that java or C#.. Strings are nothing but references.

Now.. Because of the transparent optimizations performed by runtime, its very necessary to know when 2 strings are equal. Strings can be compared by comparing string instances, object instances, and their values. The default comparison operations in both C# and Java compare the string's values instead of the object references.

Lets try to understand this by the following C# program.

using System;

public class StringTest
{


public static void Main(string[] args)
{

char[] c = new char[] {'H','e','l','l','o'};
string[] strings = new string[4];
strings[0] = "Hello";
strings[1] = "World";
strings[2] = "Hello";
strings[3] = new string(c);

Console.WriteLine("-----[ Comparing strings using ==
operator of System.String ]-----");
for (int i =0; i < strings.Length; ++i){
for (int j = i+1; j < strings.Length; ++j){

Console.WriteLine("{0} = {1} {2}",i,j,strings[i] ==
strings[j]);
}
}


Console.WriteLine("-----[ Comparing strings using ==
operator of System.Object ]-----");
for (int i =0; i < strings.Length; ++i){
for (int j = i+1; j < strings.Length; ++j){

Console.WriteLine("{0} = {1}{2}",i,j,
(object)strings[i] ==
(object)strings[j]);
}
}


Console.WriteLine("-----[ Comparing String using the Equals
method ]-----");
for (int i =0; i < strings.Length; ++i){
for (int j = i+1; j < strings.Length; ++j){

Console.WriteLine("{0} = {1}{2}",i,j,
strings[i].Equals(strings[j]));
}
}
}
}

The output of above code is :

-----[ Comparing strings using == operator of System.String ]-----
0 = 1 False
0 = 2 True
0 = 3 True
1 = 2 False
1 = 3 False
2 = 3 True
-----[ Comparing strings using == operator of System.Object ]-----
0 = 1False
0 = 2True
0 = 3False
1 = 2False
1 = 3False
2 = 3False
-----[ Comparing String using the Equals method ]-----
0 = 1False
0 = 2True
0 = 3True
1 = 2False
1 = 3False
2 = 3True

In C#, the == operator of the System.Object method compares the references of the objects and returns true if the two references point to the same object, and false otherwise. The == operator is overloaded in the System.String class to compare the value of the strings instead of the actual objects they refer to. Therefore, in example above, strings[3] == strings[0] returns true because they have the same string value. However, when both strings[0] and strings[3] are cast to the object and then compared using the == operator, they return false. This is because strings[3] points to a different string object.

C# also has the Equals method, which is similar in behavior to the == operator. Notice in the output of above code that comparing strings using the Equals method and the overloaded == operator gives the same results.

Java programmers must be particularly careful when dealing with the == operator and C# strings.

Note that in Java the equals method functions the same way as the == operator of C#. The equals method is defined on java.lang.Object, and by default it compares the object references. However, the java.lang.String class overrides the equals method and compares the String values.

Coming to string interning now..

String Interning is a method of storing only one copy of each distinct string value, which must be immutable.
Simply stated: string interning keeps a hashtable of strings while running an application. If a string with the same contents is created, no new heap allocation happens but instead a reference to the existing (identical) string is returned.

String interning is supported by most of the modern object-oriented programming languages, including Python, Ruby (with its symbols), Java and .NET languages.

The single copy of each string is called its 'intern' and is typically looked up by a method of the string class, for example String.intern() in Java, and in C# as well.

String equality and interning is very much related to each other.. Lets see how..

Both Java and C# support String interning, so both the JVM and the CLR create only one object from a String that has the same value. This is an optimization feature so that different String references pointing to the same literal value are internally stored as pointing to the same String object. Interning saves RAM at the expense of more CPU time to detect and replace duplicate Strings. There is only one copy of each String that has been interned, no matter how many references point to it. Because Strings are immutable, the intern process is free to further save space, for example, by not creating a separate String literal for "yoddha" when it exists as a substring of some other literal such as "tech yoddha". It also speeds up the String equality compares. Interned Strings will compare faster even if you use equals instead of ==.

The intern Gotcha

All String literals present at compile time are automatically interned. It is only Strings generated on the fly as the program runs that might not be interned. A nasty side effect of this behaviour is that a program will work fine for some simple cases, but fail on complex ones. The problem comes if you used == to test for String equality where you should have used equals. The wrong code will still work much of the time because most String literals are naturally interned.

Manual Interning

It can be implemented by using intern() method. But its not advised to try and implement interning manually.. The big problem with intern is once you intern a String, you are stuck with it in RAM until the program ends. It is no longer eligible for garbage collection, even if there are no more references to it. If you want a temporary interned String, you might consider interning manually.

However, in the most recent JVMs, the interned string cache is now usually implemented in soft references fashion, so that interned strings may become eligible for garbage collection as soon as they are no longer strongly referenced.

Further References :

• http://book.javanb.com/NET-For-Java-Developers-Migrating-To-Csharp/ 0672324024_ch12lev1sec2.html


• http://mindprod.com/jgloss/interned.html


• http://javatechniques.com/



No comments: