Tuesday, 1 December 2009

System.Data Serialisation Bug - Scenario 2

Introduction

In my last post I explained that System.Data contains a bug which can cause it to throw an exception during serialisation or deserialisation of a DataSet which contains a column that uses System.Object as its .NET type.

The bug manifests itself in two known scenarios:

  • Serialising a DataSet containing a column of type System.Object using a System.Xml.XmlTextWriter instance created via the static Create method; this throws a System.ArgumentException.
  • Deserialising a DataSet containing a column of type System.Object after having successfully passed it across the wire via Windows Communication Foundation (WCF) using netTcp binding; this throws a System.Data.DataException.

This post covers the second of the two scenarios – deserialising a DataSet which has been successfully transmitted via WCF's netTcp binding.

System.Data.DataException

A couple of years ago I was asked by a client to port an existing ASMX-based Web Service to one using Windows Communication Foundation. The brief was to change as little of the client and service as possible, and to just replace the communications interface. In addition to passing simple and complex types across the wire, the application would often pass both typed and untyped DataSets (clearly a bad idea, but that's another story). One of the typed DataSets was based upon a SQL Server 2005 table containing a column of type SQL_VARIANT. Attempting to pass an instance of this DataSet from server to client produced the following exception on the client (i.e. during deserialisation):

System.Data.DataException, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
Undefined data type: 'xs:int'.

The XML fragment below represents an instance of the SQL_VARIANT column within the serialized typed DataSet. Note the use of the xsi:type="xs:int" attribute to define the data type of this column used for this particular row: although the column can contain data of any type, a given row must explicitly define which data type is being used.

<Data xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">1</Data>

Note: Much of the investigation detailed herein was carried out with the help of .NET Reflector to study what some of the classes in the .NET Framework were actually doing.

During deserialization, System.Data.Common.ObjectStorage.ConvertXmlToObject correctly identifies the fact that a type attribute within the namespace http://www.w3.org/2001/XMLSchema-instance exists on the element. It then parses the value of the type attribute (which, in our example, is xs:int) by splitting it at the colon. It performs a namespace look-up using the namespace prefix (xs in our case) in an attempt to confirm which namespace is identified by that prefix (i.e. http://www.w3.org/2001/XMLSchema). This lookup, performed ultimately by the LookupNamespace method of the System.Xml.XmlBinaryReader implemented within WCF's System.Runtime.Serialization.dll, fails. The reason it fails is that the binary representation of the Data element is incorrect in the stream provided by WCF. This, in turn, is wrong because the last call to WriteAttributeString made by the internal System.Data.XmlDataTreeWriter.XmlDataRowWriter method during serialisation of the DataSet calls the wrong overload. Read my Scenario 1 post for further details of this bug if you haven't done so already.

Representing XML as Binary

If I asked you to write out the XML for an element named el with a single attribute named att with a value val, where the el element is in the global namespace, the att attribute is in the namespace prefixed by tmp, and that namespace has been defined elsewhere, you'd write down:

<el tmp:att="val"/>

Now what if I asked you to write out the XML for an element named el with a single attribute named tmp:att with a value val, where the element el and the attribute tmp:att were both in the global namespace? Assuming you didn't object and tell me that tmp:att isn't a value attribute name you'd write down:

<el tmp:att="val"/>

The fact that you get exactly the same text is why System.Data has been able to get away with this bug for so long. When the XML is encoded as binary, things don't go so well.

They key issue here is that as I'm using "netTcpBinding" the XML message uses a binary encoding, so let's take a look at how the message above would be encoded. To make sense of this you need to understand a little about how binary encoding of XML (as performed by WCF) behaves. Rather than transmitting XML as text, binary-encoded XML is transmitted as a series of tokens, each of which has a pre-defined meaning and defines the meaning of the bytes which follow. Some of the key tokens (from the perspective of our test message) are listed below. Please don't take this as a full description of these tokens – there are better sources on the net for that (e.g. MSDN). In particular, the lengths are not always single-byte lengths – the reality is a little more complex than that.

Hex Token Meaning Following Bytes
0x01 EndElement Marks the end of the inner-most element None
0x04 MinAttribute An attribute which belongs to the global namespace A byte specifying the length of the attribute name, plus the attribute name itself
0x05 Attribute An attribute which belongs to a specified namespace A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the attribute name, plus the attribute name itself
0x09 XmlnsAttribute A namespace declaration A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the namespace URI, plus the namespace URI itself
0x40 MinElement An element which belongs to the global namespace A byte specifying the length of the element name, plus the element name itself
0x41 Element An element which belongs to a specified namespace A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the element name, plus the element name itself
0x98 Chars8Text Text encoded as UTF8 A byte specifying the length of the text, plus the text itself

This is all done in the interests of reducing the amount of data which needs to be transmitted across the wire. For example, the XML <hello>world</hello> takes 20 bytes as text but just 15 bytes when binary encoded (0x40, 0x05, h, e, l, l, o, 0x98, 0x05, w, o, r, l, d, 0x01). That might not sound like a large saving, but it all adds up over the space of a large message. The encoding also helps with parsing.

Okay, with that knowledge in hand, let's take a look at the actual message which WCF passes across the write for our XML sample above. I've highlighted the start of each token within the message and have excluded all the irrelevant data (i.e. those bytes before offset 964 and after offset 1079):

Offset Decimal Hex ASCII Token Comment
964 64 0x40 @ MinElement Represents an element with no namespace prefix
965 4 0x04 The length of the element's name (4)
966 68 0x44 D The 4 bytes starting here spell out "Data"
967 97 0x61 a
968 116 0x74 t
969 97 0x61 a
970 5 0x05 Attribute Represents a attribute with a namespace prefix
971 3 0x03 The length of the attribute's namespace prefix (3)
972 120 0x78 x The 3 bytes starting here spell out "xsi"
973 115 0x73 s
974 105 0x69 i
975 4 0x04 The length of the attribute's local name (4)
976 116 0x74 t The 4 bytes starting here spell out "type"
977 121 0x79 y
978 112 0x70 p
979 101 0x65 e
980 152 0x98 ˜ Chars8Text Represents UTF8 text (the attribute's value)
981 6 0x06 The length of the attribute's value (6)
982 120 0x78 x The 6 bytes starting here spell out "xs:int"
983 115 0x73 s
984 58 0x3A :
985 105 0x69 i
986 110 0x6E n
987 116 0x74 t
988 4 0x04 MinAttribute Represents an attribute with no namespace prefix
989 8 0x08 The length of the attribute's name (8)
990 120 0x78 x The 8 bytes starting here spell out "xmlns:xs"
991 109 0x6D m
992 108 0x6C l
993 110 0x6E n
994 115 0x73 s
995 58 0x3A :
996 120 0x78 x
997 115 0x73 s
998 152 0x98 ˜ Chars8Text Represents UTF8 text (the attribute's value)
999 32 0x20 The length of the attribute's value (32)
1000 104 0x68 h The 32 bytes starting here spell out "http://www.w3.org/2001/XMLSchema"
1001 116 0x68 t
1002 116 0x74 t
1003 112 0x70 p
1004 58 0x3A :
1005 47 0x2F /
1006 47 0x2F /
1007 119 0x77 w
1008 119 0x77 w
1009 119 0x77 w
1010 46 0x2E .
1011 119 0x77 w
1012 51 0x33 3
1013 46 0x2E .
1014 111 0x6F o
1015 114 0x72 r
1016 103 0x67 g
1017 47 0x2F /
1018 50 0x32 2
1019 48 0x30 0
1020 48 0x30 0
1021 49 0x31 1
1022 47 0x2F /
1023 88 0x58 X
1024 77 0x4D M
1025 76 0x4C L
1026 83 0x53 S
1027 99 0x63 c
1028 104 0x68 h
1029 101 0x65 e
1030 109 0x6D m
1031 97 0x61 a
1032 9 0x09 XmlnsAttribute Represents a namespace declaration.
1033 3 0x03 The length of the namespace prefix (3)
1034 120 0x78 x The 3 bytes starting here spell out "xsi"
1035 115 0x73 s
1036 105 0x69 i
1037 41 0x29 ) The length of the namespace URI (41)
1038 104 0x68 h The 41 bytes here spell out "http://www.w3.org/2001/XMLSchema-instance"
1039 116 0x74 t
1040 116 0x74 t
1041 112 0x70 p
1042 58 0x3A :
1043 47 0x2F /
1044 47 0x2F /
1045 119 0x77 w
1046 119 0x77 w
1047 119 0x77 w
1048 46 0x2E .
1049 119 0x77 w
1050 51 0x33 3
1051 46 0x2E .
1052 111 0x6F o
1053 114 0x72 r
1054 103 0x67 g
1055 47 0x2F /
1056 50 0x32 2
1057 48 0x30 0
1058 48 0x30 0
1059 49 0x31 1
1060 47 0x2F /
1061 88 0x58 X
1062 77 0x4D M
1063 76 0x4C L
1064 83 0x53 S
1065 99 0x63 c
1066 104 0x68 h
1067 101 0x65 e
1068 109 0x6D m
1069 97 0x61 a
1070 45 0x2D -
1071 105 0x69 i
1072 110 0x69 n
1073 115 0x73 s
1074 116 0x74 t
1075 97 0x61 a
1076 110 0x6E n
1077 99 0x63 c
1078 101 0x65 e
1079 131 0x83 ƒ OneTextWithEndElement Represents the text "1" and the closing of the inner-most element

Did you spot the problem? Take a look at offset 988. The 0x04 byte there identifies a MinAttribute token – an attribute which has a name and value, but no namespace prefix. The attribute's name is then given as "xmlns:xs". But that's clearly not a name – that's a namespace prefix and a name separated by a colon. As a result of this error, the namespace prefix "xs" does not get added to the collection of namespaces referenced by the LookupNamespace method. So when the "xs:int" text at offset 980 is parsed into a namespace prefix and local name, that namespace prefix cannot be found by LookupNamespace and the exception is thrown.

Working Around the Bug

The approach we took when we encountered this bug in Scenario 1 was to change the way the System.Xml.XmlWriter which serialised the DataSet was created such that it didn't object to the issue. But that won't help us here – we need to actually stop the issue occurring or the binary-encoded XML will simply not deserialise.

As we're dealing with a typed DataSet here, Visual Studio (via System.Data.Design.TypedDataSetGenerator) will have auto-generated a C# class for the DataSet. That class will inherit from System.Data.DataSet and will therefore inherit the bug. Now, a System.Data.DataSet can be serialised as XML by virtue of the fact that it implements System.Xml.Serialization.IXmlSerializable. So let's explicitly implement System.Xml.Serialization.IXmlSerializable in the derived class. The class thus changes from:

public partial class MyTypedDataSet : System.Data.DataSet
{
  // rest of class goes here
}

to:

public partial class MyTypedDataSet : System.Data.DataSet, System.Xml.Serialization.IXmlSerializable
{
  void System.Xml.Serialization.IXmlSerializable.WriteXml(System.Xml.XmlWriter writer)
  {
    base.WriteXml(writer);
  }

  System.Xml.Schema.XmlSchema System.Xml.Serialization.IXmlSerializable.GetSchema()
  {
    return null;
  }

  void System.Xml.Serialization.IXmlSerializable.ReadXml(System.Xml.XmlReader reader)
  {
    base.ReadXml(reader);
  }

  // rest of class goes here
}

I know what you're thinking – what's the point? Well, we now have control of the System.Xml.XmlWriter which is passed to the WriteXml method of the base class. So we can replace this line:

base.WriteXml(writer);

with this one:

base.WriteXml(new MyCustomXmlWriter(writer));

and thereby have the DataSet use MyCustomXmlWriter rather than the default writer (WCF's System.Xml.XmlBinaryWriter). You'll note that MyCustomXmlWriter accepts the original writer as a constructor parameter; it does this so it can simply forward every call it receives on to the original writer. I won't list the entire MyCustomXmlWriter implementation, but you'll get the idea from the following:

public class MyCustomXmlWriter : System.Xml.XmlWriter
{
  System.Xml.XmlWriter _innerXmlWriter;

  public MyCustomXmlWriter(System.Xml.XmlWriter innerXmlWriter)
  {
    _innerXmlWriter = innerXmlWriter;
  }

  public override void Close()
  {
    _innerXmlWriter.Close();
  }

  public override void Flush()
  {
    _innerXmlWriter.Flush();
  }

  // etc.
}

We override every method within MyCustomXmlWriter and simply make the identical call to the XmlWriter which was passed to our constructor (_innerXmlWriter). Again, we don't seem to have gained a lot. Well, you'll recall that the call which System.Data.XmlDataTreeWriter.XmlDataRowWriter is getting wrong is the call to System.Xml.XmlWriter.WriteAttributeString, so if we intercept that call and correct it we've solved the problem. As it happens, WriteAttributeString can't be overridden so it looks like we might be out of luck. But let's take a look at what the two-string overload of WriteAttributeString actually does using .NET Reflector:

public void WriteAttributeString(string localName, string value)
{
  this.WriteStartAttribute(null, localName, null);
  this.WriteString(value);
  this.WriteEndAttribute();
}

That's interesting, because WriteStartAttribute can be overridden; it takes the parameters prefix (set to null), localName (passed from the caller) and ns (set to null). With this knowledge in hand we can create an override of WriteStartAttribute which doesn't simply pass the call straight through to _innerXmlWriter, but looks for the situation caused by the bug and corrects for it:

public override void WriteStartAttribute(string prefix, string localName, string ns)
{
  if (prefix == null && ns == null && localName != null && localName.Contains(":"))
  {
    string[] array = localName.Split(new char[] { ':' });
    _innerXmlWriter.WriteStartAttribute(array[0], array[1], null);
  }
  else
  {
    _innerXmlWriter.WriteStartAttribute(prefix, localName, ns);
  }
}

The down-side to this work-around is that it requires a small snippet of code to be added to the auto-generated C# class for the typed DataSet. If the class is re-generated, these changes will be discarded. If anyone has a better work-around, I'd love to hear about it.

See Also

No comments:

Post a Comment