Tuesday, 23 November 2010

Detecting Month-Ends in JavaScript


I came across the following problem many years ago whilst doing some work for a financial services company. They had some code (which I don't think was written by me, but my memory is a little hazy on the subject) which attempted to establish whether the current date was the last date in a month. The code was written in JavaScript.

The normal way of performing such a test is to:

  • (a) establish what month the current date falls within,
  • (b) add a single day to the current date,
  • (c) establish what month the date resulting from (b) falls within, and
  • (d) compare values (a) and (c) - if they are different then the current date is indeed the last date in a month.

Steps (a) and (c) are easy - JavaScripts's Date object provides a getMonth() method which returns the zero-based month number (i.e. 0=January, 1=February, etc).

Step (b) isn't so easy as JavaScript's Data object does not provide any date arithmetic operations per se. That it, it does not provide an addDays() method or any equivalent thereof.

Well, the value actually stored internally by the Data object is the number of milliseconds between the date/time being represented and midnight (GMT) on 1st January 1970. And we can both read this value via the getTime() method, and write this value via the single-parametered constructor respectively. That is, I can obtain the internally stored value of a Date object called now via ms = now.getTime(); and I can create a Date object from this value via then = new Date(ms);. As we know how many milliseconds there are in a day (1000*60*60*24), we can add a day's worth of milliseconds to a given date to add one day to it - which is actually what step (b) requires.

So, putting it all together we get:

Thursday, 28 October 2010

Selectively Overriding XML Serialisation Attributes


As I mentioned in my last post, although you can override XML serialisation attributes by passing an XmlAttributeOverrides instance to an XmlSerializer, the attributes you provide for a given type and member replace all the existing XML serialisation attributes - you can't simply tweak one or two and leave the rest intact.

If you're thinking that type and members only tend to have one or two serialistion attributes, then take a look at this set attributes from an auto-generated EquityDerivativeBase class:

[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "4.0.30319.1")]

Let's suppose I want to use XmlAttributeOverrides to alter the value of the XmlTypeAttribute at run-rime, to place the element in a different namespace. Well I can. But the XmlAttributeOverrides instance I supply is used to replace all the existing attributes. So I lose each of the XmlIncludeAttribute attributes which define the classes which use this class as a base class.

Book and Genre classes (with Xml Attributes)

To demonstrate how to override the attributes selectively I'm going to use the same Book class as in my last post to demonstrate selectively overriding these attributes. I've added a lot more attributes to the members of the Book class to demonstrate that they all get retained.

[XmlRoot("book", Namespace="http://tempuri.org")]
public class Book
  public int InternalId { get; set; }

  public string Title { get; set; }

  public string[] Authors { get; set; }

  public string Isbn13 { get; set; }
  public string Extract { get; set; }

  public Genre Genre { get; set; }

  public XmlSerializerNamespaces XmlNamespaces { get; set; }

  public XmlAttribute[] OtherAttributes { get; set; }

  public XmlElement[] OtherElements { get; set; }

  public Book()
    XmlNamespaces = new XmlSerializerNamespaces();
    XmlNamespaces.Add("ns", "http://tempuri.org");

public enum Genre


The solution is to copy all the existing attributes into an XmlAttributeOverrides instance (modifying them as they're copied), and then apply the XmlAttributeOverrides to the XmlSerializer. That way the XmlAttributeOverrides object retains all of the original attributes (with the exception of any changes made in transit). Let me show you what I mean:

Sunday, 17 October 2010

Default Values Don't Get Serialised

The Problem

I came across some unusual behaviour during XML Serialisation recently and thought I'd share both the problem and my solution to it.

To demonstrate the behaviour I've created a small console application which serialises two Book objects to the Console. I've marked the default value of the Author property as "Anonymous", which seems reasonable enough. The second of the two Book objects specifically has its Author property set to "Anonymous", which it doesn't really need to have - as that's the default value.

Here's the code:

using System;
using System.ComponentModel;
using System.IO;
using System.Xml.Serialization;

namespace ConsoleApplication
  public class Program
    static void Main(string[] args)
      // create some Books
      Book[] books = new Book[]
        new Book { Title = "The Road Ahead", Author = "Bill Gates", Isbn13 = "978-0670859139" },
        new Book { Title = "Beowulf", Author = "Anonymous", Isbn13 = "978-1588278296" },

      // serialise it into a MemoryStream
      XmlSerializer xmlSerializer = new XmlSerializer(typeof(Book[]));
      MemoryStream memoryStream = new MemoryStream();
      xmlSerializer.Serialize(memoryStream, books);
      // write the contents of the MemoryStream to the Console
      memoryStream.Position = 0L;
      StreamReader streamReader = new StreamReader(memoryStream);

      // wait for user to hit ENTER

  public class Book
    public string Title { get; set; }

    public string Author { get; set; }
    public string Isbn13 { get; set; }

When I run the above, something odd happens. The Author property is not serialised for the second book, as you'll see below. It turns out that this is by design. The theory is that if the default value has been used, then there's no need to explicitly output it - a consumer of the data will simply see that the element is missing and infer the default value.

<?xml version="1.0"?>
<ArrayOfBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <title>The Road Ahead</title>
    <author>Bill Gates</author>

The problem, obviously, is that this approach assumes that consumer is aware that a default value exists. Can you see any indication from the serialised XML that a default value exists, or that an element has been suppressed because it's value was equal to the default value? No - that information is hidden from the consumer. We're simply left to hope that the consumer is aware of this default value.

Saturday, 10 April 2010

Validating Integers in SQL Server


Hypothetical scenario: Data has been loaded into a SQL Server table from a CSV file; each column in the table has been defined as VARCHAR(255) to prevent the data load job from failing if one or more values are not of the expected data type. You're job is to copy data from this 'raw' table into one where the columns are strongly typed (i.e. INT, DATETIME, DECIMAL, etc). The requirement is to pull through those rows which are entirely valid, but to ignore those rows where any value in the VARCHAR column cannot be successfully converted to the target data type. Actually, if rows which are only partially valid could be copied across too, with invalid values being set to NULL in the target table, then that would be even better.

The requirement seems simple enough, but how would you do this? It's actually quite tricky to get right. So I'm going to make the scenario even simpler - you only have to support INT data types. Your job is simply to make sure that the supplied VARCHAR can be converted to an INT before you actual attempt to do so.


You first port-of-call might be to use ISNUMERIC to establish whether a source value is numeric. This sounds reasonable enough.

SELECT ISNUMERIC('9876543210')

The above statement returns 1 - the value '9876543210' is numeric. But is it an INT? No - it's too large. The documentation for ISNUMERIC says that it returns 1 if the supplied character column can be converted to at least one of the numeric data types. So that's not particularly useful if we're trying to establish whether a value is a valid INT or not.

Wednesday, 31 March 2010

Strong Name Storage


In Adding a Strong Name to a Third-Party Assembly I described how you can add a strong name to a third-party assembly by disassembling that assembly into IL using ILDASM and then re-assemble it with ILASM, specifying a strong name key.

I never really liked that approach. It seemed like using a sledge-hammer to crack a nut: just because we want to add a few bytes (i.e. the strong name) into an assembly, we have to break the entire assembly apart and put it back together again?

It always struck me as odd that the Strong Name Tool (SN.exe) provided no support for adding a strong name signature to a assembly, unless one already existed or the assembly had at least been delayed signed at compilation stage. Is adding a few bytes into the assembly to accommodate the strong name signature really that hard?

So for the last couple of weeks I've been studying both:

These documents describe the physical structure of .NET assemblies. Using the information contained therein, I've been able to establish just how a strong name is stored within an assembly. It's not pretty.

Friday, 12 March 2010

Launching a ClickOnce Application


If you're anything like the me, you won’t be content with knowing that browsing to http://MyServer/MyVirtualDir/MyApplication.application downloads, installs and launches MyApplication. You'll want to know how this is achieved. This post goes under the covers of ClickOnce to show you how it's done. Well, at least how some of it is done.

As this post makes quite a lot of references to registry entries, I'll use the standard abbreviations HKCU and HKCR to represent the HKEY_CURRENT_USER and HKEY_CLASSES_ROOT hives respectively.

Retrieving the Deployment Manifest

What happens when you browse to http://MyServer/MyVirtualDir/MyApplication.application?

The first thing to realise is that no 'magic' is happening behind the scenes. When you send an HTTP GET request to a web server (Internet Information Services, IIS, for example) the web server will typically react in one of two ways. It will either establish that your request must be forwarded to a component on the server for processing (a 'handler' in IIS terminology) or will establish that the request represents a static file which should simply have its contents returned.

IIS stores details of handlers, and the requests which are forwarded to each handler, in its metabase. Handlers can be defined in terms of a script map (potentially in combination with a managed handler) or a module mapping. For example:

  • Requests matching the pattern *.asp are forwarded, via a script map, to the ISAPI module %windir%\system32\inetsrv\asp.dll.
  • Requests matching the pattern *.aspx are forwarded, via a script map, to the ISAPI module %windir%\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll and to the managed handler System.Web.UI.PageHandlerFactory.
  • Requests matching the pattern *.shtml are forwarded, via a module mapping, to the pre-registered ServerSideIncludeModule module.

Tuesday, 2 March 2010

Serialising Enumerations with WCF


There are occasions when the exceptions thrown by Windows Communication Foundation (WCF) are clear and unambiguous. There are also occassions where they are not. In this post I'll describe a real-world problem which a colleague encountered recently together which the steps which you might reasonably perform to diagnose it.

The problem manifested itself in a System.ServiceModel.CommunicationException being thrown with nothing particularly useful in the Exception message itself.

System.ServiceModel.CommunicationException: An error occurred while receiving th
e HTTP response to http://localhost:8731/Design_Time_Addresses/WcfServiceLibrary
/Service1/. This could be due to the service endpoint binding not using the HTTP
 protocol. This could also be due to an HTTP request context being aborted by th
e server (possibly due to the service shutting down). See server logs for more d
etails. ---> System.Net.WebException: The underlying connection was closed: An u
nexpected error occurred on a receive. ---> System.IO.IOException: Unable to rea
d data from the transport connection: An existing connection was forcibly closed
 by the remote host. ---> System.Net.Sockets.SocketException: An existing connec
tion was forcibly closed by the remote host

Clearly something very bad happened. But the WCF service itself was still up and running, so one of the theories put forward in the message above "possibly due to the service shutting down" can be quickly dismissed. I'm sure it's obvious, but the above message was generated by my mock-up of the problem rather than the original problem itself.

Mock-up of Original Problem

Let me quickly run through the mock-up of the problem which I put together. It consists of three projects: WcfServiceLibrary contains a single WCF service called Service1, WcfConsoleHost provides a host for Service1 in the form of a Console Application, and WcfClient consumes Service1.

Solution Explorer view of SerialisingEnumerationsWithWCF.sln

Tuesday, 23 February 2010

Faking a Strong Name


Once upon a time a strong name was just that - strong. If you'd referenced an assembly with a strong name within your application you could be sure that that's the assembly you were going to get a run-time. Well, even that was never true. An assembly re-direct could cause you to be delivered a different version but it would always have to have the same name and public key token, and hence must have originated from the same source. They key thing is that no-one could slip your application a fake assembly and pass it off as the real thing.

Well, it turns out that now they can. You simply place your fake assembly in the GAC folder within which the real version of the assembly would logically reside. The name of the folder is used to establish its version and public key token. Don't believe me? Read on...


The scenario is based upon a situation I encountered a few years ago. ThirdPartySecurity contains a class called DirectoryServices which is essentially a wrapper around System.DirectoryServices - it allows the caller to easily performs some look-ups against Active Directory ('who are the members of this group', that kind of thing). ConsoleApplication has a reference to ThirdPartySecurity and simply acts as a test harness. MockDirectoryServices is exactly that - a mocked-up implementation of System.DirectoryServices. It is not referenced by anything.

Solution Explorer view of FakingAStrongName.sln

Tuesday, 16 February 2010

Evading the Strong Name Integrity Check


In a recent post, Tampering with a Strong-Named Assembly, I pointed out that the Strong-Name Bypass Feature in .NET 3.5 SP1 means that assemblies which have been tampered with are loaded into a full-trust AppDomain without complaint. This appears to be based upon the logic that if you can't even look after the assemblies in your own application folder, why should you expect Microsoft to do so?

But the Global Assembly Cache (GAC) is different. When you attempt to install an assembly into the GAC the integrity of its strong name is verified anyway. After all, GACUTIL doesn't know anything about the AppDomain into which the assembly will be loaded. If we attempt to install the "I am a hacker" version of the StrongNamedLibrary.dll we created during Tampering with a Strong-Named Assembly into the GAC we'll get an exception:

GACUTIL /i StrongNamedLibrary.dll

Microsoft (R) .NET Global Assembly Cache Utility.  Version 3.5.30729.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Failure adding assembly to the cache: Strong name signature could not be verifie
d.  Was the assembly built delay-signed?

Good. That's what should happen. That version of StrongNamedLibrary.dll had been tampered with and GACUTIL picked up on the fact.

However, as we discovered in Where is the Global Assembly Cache?, using GACUTIL isn't the only way to install an assembly into the GAC. We can simply copy it to a folder named %SystemRoot%\assembly\GAC_ProcessorArchitecture\AssemblyName\Version_Culture_PublicKeyToken. Clearly if we do that, we'll evade the strong name integrity check which GACUTIL performs. Will anything else detect the tampering? Let's find out.

Monday, 8 February 2010

Where is the Global Assembly Cache?


When I first came across the Global Assembly Cache (GAC), back in the days of .NET 1.1, my first thoughts were: "But what is it? And where is it?". Even if I lookup its official definition now, all I get is:

A machine-wide code cache that stores assemblies specifically installed to be shared by many applications on the computer. Applications deployed in the global assembly cache must have a strong name. See also: assembly cache, strong name.

Looking up assembly cache in the same list of definitions gives:

A code cache used for side-by-side storage of assemblies. There are two parts to the cache: the global assembly cache contains assemblies that are explicitly installed to be shared among many applications on the computer; the download cache stores code downloaded from Internet or intranet sites, isolated to the application that caused the download so that code downloaded on behalf of one application or page does not impact other applications. See also: global assembly cache.

I don't like vagueness. Are the assemblies stored in a database? As binary objects within the registry? As files on the file system? Where?

I decided to find out the answer. It wasn't hard. But every time I mention it to people they seem surprised. So I thought quickly document it here.

Monday, 1 February 2010

Tampering with a Strong-Named Assembly


If you've read the Microsoft document on .NET you'll know that the use of Strong Names is recommended. The Code Analyser within Visual Studio goes as far as generating error CA2210, "Assemblies should have valid strong names", if the assembly does not have a strong name.

One of the reasons given for the use of strong names is that they're tamper-proof. For example, Strong Names Assemblies says: Strong names provide a strong integrity check. Passing the .NET Framework security checks guarantees that the contents of the assembly have not been changed since it was built. Also, Assemblies should have valid strong names says: This rule retrieves and verifies the strong name of an assembly. A violation occurs if... [t]he assembly was altered after signing.

So, strong-names assemblies are tamper-proof. Right? Wrong.

Watch, I'll Show You

Let's create a really simple class library and ensure it has a strong name. We start by firing up Visual Studio 2008 and creating a new .NET 3.5 Class Library called StrongNamedLibrary. We then replace the contents of Class1.cs with this code:

namespace StrongNamedLibrary
    public class Class1
        public string Message
            get { return "I am a friend"; }

We can then use the Signing section of the Project Properties pane to create and assign a public/private key pair. We do this by clicking the Sign the assembly check box and selecting New from the Choose a strong name key file drop-down. When prompted, we specify a name of StrongNamedLibrary (with no extension) and whatever password we like.

Before we finish with StrongNamedLibrary, I want to make sure is easy for us to tweak the version number outside of Visual Studio, so let's open Properties\AssemblyInfo.cs and change the AssemblyVersion attribute to 8961.26437.43913.61389 (this should be easy to spot in the generated binary file as it'll appear as 0123456789ABCDEF in hexadecimal).

Wednesday, 27 January 2010

Undropping a SQL Server Table - Part 6


We've pretty much wrapped-up this series on how to recover data from tables which had been accidentally dropped, and for which no backup exists.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables (Part 2), parsed each page into individual records (Part 3), parsed those records (which were still in raw hexadecimal form) into individual columns to create a standard SQL Server rowset (Part 4), and generated the record-parsing code dynamically for a table with an arbitrary structure (Part 5).

This final part will discuss how to cope when you don't know the structure of the table you're trying to recover. As the meta-data (column names, types, etc.) will have been lost when the table was dropped, and as there isn't a way to recover that meta-data (which I know of), we'll have to guess it. This post is about that guess-work.

This Might Not Work

All of the previous posts in this series were describing the steps I'd taken to recover some dropped tables. They were describing a process I knew worked because I'd used it to recover 10GB or so of data. This post is different. I plan to guess the structure of a table and write about it as I go. This means that when I make mistakes (which I will) you'll get to read about them. In fact, it might be that I can't recover this data because I can't work out the table structure. We'll have to wait and see.

Monday, 18 January 2010

Undropping a SQL Server Table - Part 5


This series of posts covers the steps required to recover data from tables which had been accidentally dropped, and for which no backup existed.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables, (Part 2), parsed each page into individual records (Part 3), and parsed those records (which were still in raw hexadecimal form) into individual columns to create a standard SQL Server rowset (Part 4).

This part will present the SQL which can be used to auto-generate the statement we came up with in the last post for a table with an arbitrary structure.

Record Structure

The statement we're trying to generate is that which appears at the end of (Part 4). I should warn you in advance that the code presented below is ugly. Very ugly. I shouldn't have attempted to generate the SQL via SQL. But I did. Or, at least I should have refactored some of the repetitive code into user-defined function. But I didn't. So the code below is what I actually used to generate the SQL to perform the parsing. Sorry.

I won't walk through the SQL, as I have done with previous samples, as I walked through the SQL we're trying to generate in the last post and because I think the comments herein should fully explain what's going on.

USE DataRecovery

-- note that only the following data types are supported by this script:

-- specify the table we're attempting to recover; we use this to key into LazarusColumn
-- to establish the structure of the table whose records are in LazarusRecord
SET @TableName = 'TestTable' -- <<< change this
-- @SQL will contain the dynamic SQL which we'll ultimately execute
-- @Position contains the 1-based position within LazarusRecord.Data which we're currently
-- focussed on
SET @Position = 1

Tuesday, 12 January 2010

Undropping a SQL Server Table - Part 4


If you’ve been following this series of posts you know that I’m running through the steps I used to recover approximately 10GB of data from tables which had been accidentally dropped, and for which no backup existed.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables (Part 2), and parsed each page into individual records (Part 3).

This part will parse those records (which are still in raw hexadecimal form) into individual columns (i.e. into CHARs, DECIMALs, INTs, etc).

Record Structure

Before we actually parse the record, let’s a quickly re-cap of its structure. In Part 2 of this series, I described the basic layout of a record as follows:

Offset Size Mnemonic
0 1 TagA
1 1 TagB
2 2 Fsize
4 Fsize – 4 Fdata
Fsize 2 Ncol
Fsize + 2 ((NCol - 1) / 8) + 1 Nullbits
Fsize + 2 + size of Nullbits 2 VarCount
Fsize + 2 + size of Nullbits + 2 VarCount * 2 VarOffset
Fsize + 2 + size of Nullbits + 2 + (VarCount * 2) (varies) VarData

All the data pertaining to fixed-length columns is stored in Fdata, and all the data pertaining to variable-length columns is stored in VarData.

Fdata Structure

The data for all fixed width columns is stored in one contiguous block called Fdata. The number of bytes allocated to each column is defined by its data type (and size in the case of CHAR and NCHAR columns, and precision in the case of DECIMAL columns). The number of bytes occupied does not vary from row to row. This applies even if a column contains NULL: a CHAR(10) which is NULL still occupies 10 bytes within Fdata.

The following table summarises the format in which some common fixed-length columns are stored.

Data Type Storage
INT Four-byte signed integer
SMALLINT Two-byte signed integer
TINYINT One-byte signed integer
CHAR(n) n bytes, encoded according to the relevant code page
MONEY Eight-byte signed integer; final four digits are to the right of an implied decimal point
DATETIME Four-byte signed integer specifying the number of days before or after 1st January 1900 followed by a four-byte unsigned integer containing the number of 300th of a second intervals since midnight
FLOAT Eight-byte value as specified by IEEE 754
DECIMAL(p,s) A single sign byte (0x00 or 0x01) followed by a four-, eight-, twelve- or sixteen-byte signed integer with the final s digits to the right of an implied decimal point

Note that multi-byte numeric values are always stored with their bytes reversed.

Fixed-Width Examples

It always helps to see an example or two, so here are some examples of these data types in action:

Monday, 4 January 2010

Undropping a SQL Server Table - Part 3


This series of posts describes the steps I went through to recover about 10GB worth of data in tables which had been accidentally dropped. In Part 1 I explained that no backup existed and that the use of third-party tools was not an option. I went on to provide a stored procedure which wrapper a call to DBCC PAGE. In Part 2 I used the stored procedure to extract a hexadecimal dump of relevant pages within the database, and to extract varies items of meta-data about both the page and the first record on the page. This post will use that information to identify to which table each page belonged and will then split the pages for a given table into individual records.


Throughout this and the remaining posts of this series I'll attempt to consistently refer to the subdivision of a page within which data is stored as a slot, to the raw hexadecimal data within a slot as a record (or slot data), and to the parsed data as a row. I'll probably manage to deviate from this somewhere along the way, but that's how I'm intending to use those terms. Feel free to point out any mistakes.

Rogue Data Pages

I'm not sure in what way the pages which make up a SQL Server database are initialised. Logic would suggest that there's no point in initialising them at all as the GAM, SGAM, IAM etc. will simply not reference them. But we're not using those structures to identify whether a page is actually in use or not – we're parsing all pages. So is it possible for us to stumble across some random uninitialized page which looks like a data page simply because offset 1 of the page is 0x01? It seems that the answer is yes. I certainly came across some pages which had 0x01 in offset 1 but which were clearly not valid data pages by virtue of the fact that the slot 0 offset (i.e. the last 4 bytes of the page) pointed outside the page, or pointed to a location within the page header. Now, it could certainly be argued that I should really have discarded those in the code I presented in Part 2 of this series. But I didn't, I did it afterwards. So that's why I'm providing this snippet now:

Sunday, 3 January 2010

Minor Changes to Blog

I made some changes to this blog over the last few days that I thought I should just make you aware of in case they cause problems.

Firstly, I added "Continue reading this post here" links to each non-trivial post on the home page. I did this to make it more likely that the search engines would index the actual post pages (which wouldn't simply be considered to be duplicates of the home page) and to reduce the overall size of the home page.

Secondly, I re-directed the Atom and RSS feeds through FeedBurner so I can assess how many subscribers the blog has (Blogger seems to provide no intrinsic support for this).

Finally, I abandoned my use of Word 2007 as my Blogger client as was unhappy with the HTML it was producing. I've re-written the HTML for all the previous posts (whilst, hopefully, not changing any of the actual content) and will now seek out an alternate client.

I don't suppose these changes will cause anyone any problems. But I thought I should mention it anyway.