Blog has been moved. Actual post url: http://aakinshin.net/en/blog/dotnet/mono-utf8-conversions/.
This post is a logical continuation of the Jon Skeet's blog post “When is a string not a string?”. Jon showed very interesting things about behavior of ill-formed Unicode strings in .NET. I wondered about how similar examples will work on Mono. And I have got very interesting results.
Experiment 1: Compilation
Let's take the Jon's code with a small modification. We will just add text
null check in DumpString
:
using System;
using System.ComponentModel;
using System.Text;
using System.Linq;
[Description(Value)]
class Test
{
const string Value = "X\ud800Y";
static void Main()
{
var description = (DescriptionAttribute)typeof(Test).
GetCustomAttributes(typeof(DescriptionAttribute), true)[0];
DumpString("Attribute", description.Description);
DumpString("Constant", Value);
}
static void DumpString(string name, string text)
{
Console.Write("{0}: ", name);
if (text != null)
{
var utf16 = text.Select(c => ((uint) c).ToString("x4"));
Console.WriteLine(string.Join(" ", utf16));
}
else
Console.WriteLine("null");
}
}