Wednesday, 27 January 2010

Undropping a SQL Server Table - Part 6

Introduction

We've pretty much wrapped-up this series on how to recover data from tables which had been accidentally dropped, and for which no backup exists.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables (Part 2), parsed each page into individual records (Part 3), parsed those records (which were still in raw hexadecimal form) into individual columns to create a standard SQL Server rowset (Part 4), and generated the record-parsing code dynamically for a table with an arbitrary structure (Part 5).

This final part will discuss how to cope when you don't know the structure of the table you're trying to recover. As the meta-data (column names, types, etc.) will have been lost when the table was dropped, and as there isn't a way to recover that meta-data (which I know of), we'll have to guess it. This post is about that guess-work.

This Might Not Work

All of the previous posts in this series were describing the steps I'd taken to recover some dropped tables. They were describing a process I knew worked because I'd used it to recover 10GB or so of data. This post is different. I plan to guess the structure of a table and write about it as I go. This means that when I make mistakes (which I will) you'll get to read about them. In fact, it might be that I can't recover this data because I can't work out the table structure. We'll have to wait and see.

Monday, 18 January 2010

Undropping a SQL Server Table - Part 5

Introduction

This series of posts covers the steps required to recover data from tables which had been accidentally dropped, and for which no backup existed.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables, (Part 2), parsed each page into individual records (Part 3), and parsed those records (which were still in raw hexadecimal form) into individual columns to create a standard SQL Server rowset (Part 4).

This part will present the SQL which can be used to auto-generate the statement we came up with in the last post for a table with an arbitrary structure.

Record Structure

The statement we're trying to generate is that which appears at the end of (Part 4). I should warn you in advance that the code presented below is ugly. Very ugly. I shouldn't have attempted to generate the SQL via SQL. But I did. Or, at least I should have refactored some of the repetitive code into user-defined function. But I didn't. So the code below is what I actually used to generate the SQL to perform the parsing. Sorry.

I won't walk through the SQL, as I have done with previous samples, as I walked through the SQL we're trying to generate in the last post and because I think the comments herein should fully explain what's going on.

USE DataRecovery
GO

-- note that only the following data types are supported by this script:
-- CHAR, DATETIME, DECIMAL, FLOAT, INT, MONEY, NVARCHAR, SMALLINT, TINYINT, VARCHAR

-- specify the table we're attempting to recover; we use this to key into LazarusColumn
-- to establish the structure of the table whose records are in LazarusRecord
DECLARE @TableName SYSNAME
SET @TableName = 'TestTable' -- <<< change this
 
-- @SQL will contain the dynamic SQL which we'll ultimately execute
DECLARE @SQL VARCHAR(MAX)
 
-- @Position contains the 1-based position within LazarusRecord.Data which we're currently
-- focussed on
DECLARE @Position INT
SET @Position = 1

Tuesday, 12 January 2010

Undropping a SQL Server Table - Part 4

Introduction

If you’ve been following this series of posts you know that I’m running through the steps I used to recover approximately 10GB of data from tables which had been accidentally dropped, and for which no backup existed.

So far, we’ve created a stored procedure with which to extract a hexadecimal dump of a specified page (Part 1), identified which pages contained data from which tables (Part 2), and parsed each page into individual records (Part 3).

This part will parse those records (which are still in raw hexadecimal form) into individual columns (i.e. into CHARs, DECIMALs, INTs, etc).

Record Structure

Before we actually parse the record, let’s a quickly re-cap of its structure. In Part 2 of this series, I described the basic layout of a record as follows:

Offset Size Mnemonic
0 1 TagA
1 1 TagB
2 2 Fsize
4 Fsize – 4 Fdata
Fsize 2 Ncol
Fsize + 2 ((NCol - 1) / 8) + 1 Nullbits
Fsize + 2 + size of Nullbits 2 VarCount
Fsize + 2 + size of Nullbits + 2 VarCount * 2 VarOffset
Fsize + 2 + size of Nullbits + 2 + (VarCount * 2) (varies) VarData

All the data pertaining to fixed-length columns is stored in Fdata, and all the data pertaining to variable-length columns is stored in VarData.

Fdata Structure

The data for all fixed width columns is stored in one contiguous block called Fdata. The number of bytes allocated to each column is defined by its data type (and size in the case of CHAR and NCHAR columns, and precision in the case of DECIMAL columns). The number of bytes occupied does not vary from row to row. This applies even if a column contains NULL: a CHAR(10) which is NULL still occupies 10 bytes within Fdata.

The following table summarises the format in which some common fixed-length columns are stored.

Data Type Storage
INT Four-byte signed integer
SMALLINT Two-byte signed integer
TINYINT One-byte signed integer
CHAR(n) n bytes, encoded according to the relevant code page
MONEY Eight-byte signed integer; final four digits are to the right of an implied decimal point
DATETIME Four-byte signed integer specifying the number of days before or after 1st January 1900 followed by a four-byte unsigned integer containing the number of 300th of a second intervals since midnight
FLOAT Eight-byte value as specified by IEEE 754
DECIMAL(p,s) A single sign byte (0x00 or 0x01) followed by a four-, eight-, twelve- or sixteen-byte signed integer with the final s digits to the right of an implied decimal point

Note that multi-byte numeric values are always stored with their bytes reversed.

Fixed-Width Examples

It always helps to see an example or two, so here are some examples of these data types in action:

Monday, 4 January 2010

Undropping a SQL Server Table - Part 3

Introduction

This series of posts describes the steps I went through to recover about 10GB worth of data in tables which had been accidentally dropped. In Part 1 I explained that no backup existed and that the use of third-party tools was not an option. I went on to provide a stored procedure which wrapper a call to DBCC PAGE. In Part 2 I used the stored procedure to extract a hexadecimal dump of relevant pages within the database, and to extract varies items of meta-data about both the page and the first record on the page. This post will use that information to identify to which table each page belonged and will then split the pages for a given table into individual records.

Terminology

Throughout this and the remaining posts of this series I'll attempt to consistently refer to the subdivision of a page within which data is stored as a slot, to the raw hexadecimal data within a slot as a record (or slot data), and to the parsed data as a row. I'll probably manage to deviate from this somewhere along the way, but that's how I'm intending to use those terms. Feel free to point out any mistakes.

Rogue Data Pages

I'm not sure in what way the pages which make up a SQL Server database are initialised. Logic would suggest that there's no point in initialising them at all as the GAM, SGAM, IAM etc. will simply not reference them. But we're not using those structures to identify whether a page is actually in use or not – we're parsing all pages. So is it possible for us to stumble across some random uninitialized page which looks like a data page simply because offset 1 of the page is 0x01? It seems that the answer is yes. I certainly came across some pages which had 0x01 in offset 1 but which were clearly not valid data pages by virtue of the fact that the slot 0 offset (i.e. the last 4 bytes of the page) pointed outside the page, or pointed to a location within the page header. Now, it could certainly be argued that I should really have discarded those in the code I presented in Part 2 of this series. But I didn't, I did it afterwards. So that's why I'm providing this snippet now:

Sunday, 3 January 2010

Minor Changes to Blog

I made some changes to this blog over the last few days that I thought I should just make you aware of in case they cause problems.

Firstly, I added "Continue reading this post here" links to each non-trivial post on the home page. I did this to make it more likely that the search engines would index the actual post pages (which wouldn't simply be considered to be duplicates of the home page) and to reduce the overall size of the home page.

Secondly, I re-directed the Atom and RSS feeds through FeedBurner so I can assess how many subscribers the blog has (Blogger seems to provide no intrinsic support for this).

Finally, I abandoned my use of Word 2007 as my Blogger client as was unhappy with the HTML it was producing. I've re-written the HTML for all the previous posts (whilst, hopefully, not changing any of the actual content) and will now seek out an alternate client.

I don't suppose these changes will cause anyone any problems. But I thought I should mention it anyway.