Introduction
In my last post I explained that System.Data contains a bug which can cause it to throw an exception during serialisation or deserialisation of a DataSet which contains a column that uses System.Object as its .NET type.
The bug manifests itself in two known scenarios:
- Serialising a DataSet containing a column of type System.Object using a System.Xml.XmlTextWriter instance created via the static Create method; this throws a System.ArgumentException.
- Deserialising a DataSet containing a column of type System.Object after having successfully passed it across the wire via Windows Communication Foundation (WCF) using netTcp binding; this throws a System.Data.DataException.
This post covers the second of the two scenarios – deserialising a DataSet which has been successfully transmitted via WCF's netTcp binding.
System.Data.DataException
A couple of years ago I was asked by a client to port an existing ASMX-based Web Service to one using Windows Communication Foundation. The brief was to change as little of the client and service as possible, and to just replace the communications interface. In addition to passing simple and complex types across the wire, the application would often pass both typed and untyped DataSets (clearly a bad idea, but that's another story). One of the typed DataSets was based upon a SQL Server 2005 table containing a column of type SQL_VARIANT. Attempting to pass an instance of this DataSet from server to client produced the following exception on the client (i.e. during deserialisation):
System.Data.DataException, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 Undefined data type: 'xs:int'.
The XML fragment below represents an instance of the SQL_VARIANT column within the serialized typed DataSet. Note the use of the xsi:type="xs:int" attribute to define the data type of this column used for this particular row: although the column can contain data of any type, a given row must explicitly define which data type is being used.
<Data xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">1</Data>
Note: Much of the investigation detailed herein was carried out with the help of .NET Reflector to study what some of the classes in the .NET Framework were actually doing.
During deserialization, System.Data.Common.ObjectStorage.ConvertXmlToObject correctly identifies the fact that a type attribute within the namespace http://www.w3.org/2001/XMLSchema-instance exists on the element. It then parses the value of the type attribute (which, in our example, is xs:int) by splitting it at the colon. It performs a namespace look-up using the namespace prefix (xs in our case) in an attempt to confirm which namespace is identified by that prefix (i.e. http://www.w3.org/2001/XMLSchema). This lookup, performed ultimately by the LookupNamespace method of the System.Xml.XmlBinaryReader implemented within WCF's System.Runtime.Serialization.dll, fails. The reason it fails is that the binary representation of the Data element is incorrect in the stream provided by WCF. This, in turn, is wrong because the last call to WriteAttributeString made by the internal System.Data.XmlDataTreeWriter.XmlDataRowWriter method during serialisation of the DataSet calls the wrong overload. Read my Scenario 1 post for further details of this bug if you haven't done so already.
Representing XML as Binary
If I asked you to write out the XML for an element named el with a single attribute named att with a value val, where the el element is in the global namespace, the att attribute is in the namespace prefixed by tmp, and that namespace has been defined elsewhere, you'd write down:
<el tmp:att="val"/>
Now what if I asked you to write out the XML for an element named el with a single attribute named tmp:att with a value val, where the element el and the attribute tmp:att were both in the global namespace? Assuming you didn't object and tell me that tmp:att isn't a value attribute name you'd write down:
<el tmp:att="val"/>
The fact that you get exactly the same text is why System.Data has been able to get away with this bug for so long. When the XML is encoded as binary, things don't go so well.
They key issue here is that as I'm using "netTcpBinding" the XML message uses a binary encoding, so let's take a look at how the message above would be encoded. To make sense of this you need to understand a little about how binary encoding of XML (as performed by WCF) behaves. Rather than transmitting XML as text, binary-encoded XML is transmitted as a series of tokens, each of which has a pre-defined meaning and defines the meaning of the bytes which follow. Some of the key tokens (from the perspective of our test message) are listed below. Please don't take this as a full description of these tokens – there are better sources on the net for that (e.g. MSDN). In particular, the lengths are not always single-byte lengths – the reality is a little more complex than that.
Hex | Token | Meaning | Following Bytes |
0x01 | EndElement | Marks the end of the inner-most element | None |
0x04 | MinAttribute | An attribute which belongs to the global namespace | A byte specifying the length of the attribute name, plus the attribute name itself |
0x05 | Attribute | An attribute which belongs to a specified namespace | A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the attribute name, plus the attribute name itself |
0x09 | XmlnsAttribute | A namespace declaration | A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the namespace URI, plus the namespace URI itself |
0x40 | MinElement | An element which belongs to the global namespace | A byte specifying the length of the element name, plus the element name itself |
0x41 | Element | An element which belongs to a specified namespace | A byte specifying the length of the namespace prefix, plus the namespace prefix, plus the length of the element name, plus the element name itself |
0x98 | Chars8Text | Text encoded as UTF8 | A byte specifying the length of the text, plus the text itself |
This is all done in the interests of reducing the amount of data which needs to be transmitted across the wire. For example, the XML <hello>world</hello> takes 20 bytes as text but just 15 bytes when binary encoded (0x40, 0x05, h, e, l, l, o, 0x98, 0x05, w, o, r, l, d, 0x01). That might not sound like a large saving, but it all adds up over the space of a large message. The encoding also helps with parsing.
Okay, with that knowledge in hand, let's take a look at the actual message which WCF passes across the write for our XML sample above. I've highlighted the start of each token within the message and have excluded all the irrelevant data (i.e. those bytes before offset 964 and after offset 1079):
Offset | Decimal | Hex | ASCII | Token | Comment |
964 | 64 | 0x40 | @ | MinElement | Represents an element with no namespace prefix |
965 | 4 | 0x04 | The length of the element's name (4) | ||
966 | 68 | 0x44 | D | The 4 bytes starting here spell out "Data" | |
967 | 97 | 0x61 | a | ||
968 | 116 | 0x74 | t | ||
969 | 97 | 0x61 | a | ||
970 | 5 | 0x05 | Attribute | Represents a attribute with a namespace prefix | |
971 | 3 | 0x03 | The length of the attribute's namespace prefix (3) | ||
972 | 120 | 0x78 | x | The 3 bytes starting here spell out "xsi" | |
973 | 115 | 0x73 | s | ||
974 | 105 | 0x69 | i | ||
975 | 4 | 0x04 | The length of the attribute's local name (4) | ||
976 | 116 | 0x74 | t | The 4 bytes starting here spell out "type" | |
977 | 121 | 0x79 | y | ||
978 | 112 | 0x70 | p | ||
979 | 101 | 0x65 | e | ||
980 | 152 | 0x98 | ˜ | Chars8Text | Represents UTF8 text (the attribute's value) |
981 | 6 | 0x06 | The length of the attribute's value (6) | ||
982 | 120 | 0x78 | x | The 6 bytes starting here spell out "xs:int" | |
983 | 115 | 0x73 | s | ||
984 | 58 | 0x3A | : | ||
985 | 105 | 0x69 | i | ||
986 | 110 | 0x6E | n | ||
987 | 116 | 0x74 | t | ||
988 | 4 | 0x04 | MinAttribute | Represents an attribute with no namespace prefix | |
989 | 8 | 0x08 | The length of the attribute's name (8) | ||
990 | 120 | 0x78 | x | The 8 bytes starting here spell out "xmlns:xs" | |
991 | 109 | 0x6D | m | ||
992 | 108 | 0x6C | l | ||
993 | 110 | 0x6E | n | ||
994 | 115 | 0x73 | s | ||
995 | 58 | 0x3A | : | ||
996 | 120 | 0x78 | x | ||
997 | 115 | 0x73 | s | ||
998 | 152 | 0x98 | ˜ | Chars8Text | Represents UTF8 text (the attribute's value) |
999 | 32 | 0x20 | The length of the attribute's value (32) | ||
1000 | 104 | 0x68 | h | The 32 bytes starting here spell out "http://www.w3.org/2001/XMLSchema" | |
1001 | 116 | 0x68 | t | ||
1002 | 116 | 0x74 | t | ||
1003 | 112 | 0x70 | p | ||
1004 | 58 | 0x3A | : | ||
1005 | 47 | 0x2F | / | ||
1006 | 47 | 0x2F | / | ||
1007 | 119 | 0x77 | w | ||
1008 | 119 | 0x77 | w | ||
1009 | 119 | 0x77 | w | ||
1010 | 46 | 0x2E | . | ||
1011 | 119 | 0x77 | w | ||
1012 | 51 | 0x33 | 3 | ||
1013 | 46 | 0x2E | . | ||
1014 | 111 | 0x6F | o | ||
1015 | 114 | 0x72 | r | ||
1016 | 103 | 0x67 | g | ||
1017 | 47 | 0x2F | / | ||
1018 | 50 | 0x32 | 2 | ||
1019 | 48 | 0x30 | 0 | ||
1020 | 48 | 0x30 | 0 | ||
1021 | 49 | 0x31 | 1 | ||
1022 | 47 | 0x2F | / | ||
1023 | 88 | 0x58 | X | ||
1024 | 77 | 0x4D | M | ||
1025 | 76 | 0x4C | L | ||
1026 | 83 | 0x53 | S | ||
1027 | 99 | 0x63 | c | ||
1028 | 104 | 0x68 | h | ||
1029 | 101 | 0x65 | e | ||
1030 | 109 | 0x6D | m | ||
1031 | 97 | 0x61 | a | ||
1032 | 9 | 0x09 | XmlnsAttribute | Represents a namespace declaration. | |
1033 | 3 | 0x03 | The length of the namespace prefix (3) | ||
1034 | 120 | 0x78 | x | The 3 bytes starting here spell out "xsi" | |
1035 | 115 | 0x73 | s | ||
1036 | 105 | 0x69 | i | ||
1037 | 41 | 0x29 | ) | The length of the namespace URI (41) | |
1038 | 104 | 0x68 | h | The 41 bytes here spell out "http://www.w3.org/2001/XMLSchema-instance" | |
1039 | 116 | 0x74 | t | ||
1040 | 116 | 0x74 | t | ||
1041 | 112 | 0x70 | p | ||
1042 | 58 | 0x3A | : | ||
1043 | 47 | 0x2F | / | ||
1044 | 47 | 0x2F | / | ||
1045 | 119 | 0x77 | w | ||
1046 | 119 | 0x77 | w | ||
1047 | 119 | 0x77 | w | ||
1048 | 46 | 0x2E | . | ||
1049 | 119 | 0x77 | w | ||
1050 | 51 | 0x33 | 3 | ||
1051 | 46 | 0x2E | . | ||
1052 | 111 | 0x6F | o | ||
1053 | 114 | 0x72 | r | ||
1054 | 103 | 0x67 | g | ||
1055 | 47 | 0x2F | / | ||
1056 | 50 | 0x32 | 2 | ||
1057 | 48 | 0x30 | 0 | ||
1058 | 48 | 0x30 | 0 | ||
1059 | 49 | 0x31 | 1 | ||
1060 | 47 | 0x2F | / | ||
1061 | 88 | 0x58 | X | ||
1062 | 77 | 0x4D | M | ||
1063 | 76 | 0x4C | L | ||
1064 | 83 | 0x53 | S | ||
1065 | 99 | 0x63 | c | ||
1066 | 104 | 0x68 | h | ||
1067 | 101 | 0x65 | e | ||
1068 | 109 | 0x6D | m | ||
1069 | 97 | 0x61 | a | ||
1070 | 45 | 0x2D | - | ||
1071 | 105 | 0x69 | i | ||
1072 | 110 | 0x69 | n | ||
1073 | 115 | 0x73 | s | ||
1074 | 116 | 0x74 | t | ||
1075 | 97 | 0x61 | a | ||
1076 | 110 | 0x6E | n | ||
1077 | 99 | 0x63 | c | ||
1078 | 101 | 0x65 | e | ||
1079 | 131 | 0x83 | ƒ | OneTextWithEndElement | Represents the text "1" and the closing of the inner-most element |
Did you spot the problem? Take a look at offset 988. The 0x04 byte there identifies a MinAttribute token – an attribute which has a name and value, but no namespace prefix. The attribute's name is then given as "xmlns:xs". But that's clearly not a name – that's a namespace prefix and a name separated by a colon. As a result of this error, the namespace prefix "xs" does not get added to the collection of namespaces referenced by the LookupNamespace method. So when the "xs:int" text at offset 980 is parsed into a namespace prefix and local name, that namespace prefix cannot be found by LookupNamespace and the exception is thrown.
Working Around the Bug
The approach we took when we encountered this bug in Scenario 1 was to change the way the System.Xml.XmlWriter which serialised the DataSet was created such that it didn't object to the issue. But that won't help us here – we need to actually stop the issue occurring or the binary-encoded XML will simply not deserialise.
As we're dealing with a typed DataSet here, Visual Studio (via System.Data.Design.TypedDataSetGenerator) will have auto-generated a C# class for the DataSet. That class will inherit from System.Data.DataSet and will therefore inherit the bug. Now, a System.Data.DataSet can be serialised as XML by virtue of the fact that it implements System.Xml.Serialization.IXmlSerializable. So let's explicitly implement System.Xml.Serialization.IXmlSerializable in the derived class. The class thus changes from:
public partial class MyTypedDataSet : System.Data.DataSet { // rest of class goes here }
to:
public partial class MyTypedDataSet : System.Data.DataSet, System.Xml.Serialization.IXmlSerializable { void System.Xml.Serialization.IXmlSerializable.WriteXml(System.Xml.XmlWriter writer) { base.WriteXml(writer); } System.Xml.Schema.XmlSchema System.Xml.Serialization.IXmlSerializable.GetSchema() { return null; } void System.Xml.Serialization.IXmlSerializable.ReadXml(System.Xml.XmlReader reader) { base.ReadXml(reader); } // rest of class goes here }
I know what you're thinking – what's the point? Well, we now have control of the System.Xml.XmlWriter which is passed to the WriteXml method of the base class. So we can replace this line:
base.WriteXml(writer);
with this one:
base.WriteXml(new MyCustomXmlWriter(writer));
and thereby have the DataSet use MyCustomXmlWriter rather than the default writer (WCF's System.Xml.XmlBinaryWriter). You'll note that MyCustomXmlWriter accepts the original writer as a constructor parameter; it does this so it can simply forward every call it receives on to the original writer. I won't list the entire MyCustomXmlWriter implementation, but you'll get the idea from the following:
public class MyCustomXmlWriter : System.Xml.XmlWriter { System.Xml.XmlWriter _innerXmlWriter; public MyCustomXmlWriter(System.Xml.XmlWriter innerXmlWriter) { _innerXmlWriter = innerXmlWriter; } public override void Close() { _innerXmlWriter.Close(); } public override void Flush() { _innerXmlWriter.Flush(); } // etc. }
We override every method within MyCustomXmlWriter and simply make the identical call to the XmlWriter which was passed to our constructor (_innerXmlWriter). Again, we don't seem to have gained a lot. Well, you'll recall that the call which System.Data.XmlDataTreeWriter.XmlDataRowWriter is getting wrong is the call to System.Xml.XmlWriter.WriteAttributeString, so if we intercept that call and correct it we've solved the problem. As it happens, WriteAttributeString can't be overridden so it looks like we might be out of luck. But let's take a look at what the two-string overload of WriteAttributeString actually does using .NET Reflector:
public void WriteAttributeString(string localName, string value) { this.WriteStartAttribute(null, localName, null); this.WriteString(value); this.WriteEndAttribute(); }
That's interesting, because WriteStartAttribute can be overridden; it takes the parameters prefix (set to null), localName (passed from the caller) and ns (set to null). With this knowledge in hand we can create an override of WriteStartAttribute which doesn't simply pass the call straight through to _innerXmlWriter, but looks for the situation caused by the bug and corrects for it:
public override void WriteStartAttribute(string prefix, string localName, string ns) { if (prefix == null && ns == null && localName != null && localName.Contains(":")) { string[] array = localName.Split(new char[] { ':' }); _innerXmlWriter.WriteStartAttribute(array[0], array[1], null); } else { _innerXmlWriter.WriteStartAttribute(prefix, localName, ns); } }
The down-side to this work-around is that it requires a small snippet of code to be added to the auto-generated C# class for the typed DataSet. If the class is re-generated, these changes will be discarded. If anyone has a better work-around, I'd love to hear about it.
No comments:
Post a Comment