Sunday 13 December 2009

Undropping a SQL Server Table - Part 1

Introduction

Have you ever accidentally dropped a SQL Server table? I have. In fact I didn't just drop a SQL Server table, I dropped all of them. In my defence, I was simply running a script written by another development team in order to deploy their objects into an existing database. What I hadn't realised was that their script expected to be deployed into a clean database – one without any existing objects – and so the first thing it did was cursor through sys.objects dropping everything it found. Somewhat embarrassing, but not the end of the word. I immediately contacted our DBA and requested a point-in-time restore. That's where things really started to go wrong. It turns out that when the creation of this particular database was requested, that request didn't make any mention of a backup strategy. So there wasn't one. Hmmm.

Now, whilst there was nothing business critical on this database, there was quite a lot of data which we very much wanted to recover. (By 'quite a lot' I mean 10GB or so.) If you've ever been faced with this situation, you'll know that there are several recovery tools on the market which can help. I looked at RedGate's SQL Log Rescue but that only supports SQL Server 2000 (I was on SQL Server 2005). I looked at Lumigent's Log Explorer (part of Audit DB) but was led to believe by several posts on-line that it only supported SQL Server 2000 – Lumigent themselves are somewhat vague on the subject. ApexSQL Recover, on the other hand, looked just the ticket. It supported SQL Server 2005, had good reviews, and provided a try-before-you-buy version which would recover 1 in 10 rows. The organisation that I was working within had a very strict policy about installing third-party applications which hadn't been through a centralised testing, approval, and packaging process. The policy was simple – you don't do it. Undeterred, I managed to obtain quasi-permission to install the trial version on my desktop with a view to going through the formal approval process if the results looked promising. They didn't. Once ApexSQL Recover was installed it contacted the database server and attempted to install some extended stored procedures. Not only had I not gained permission to install anything on the database server, I knew that extended stored procedures were specifically forbidden.

After taking my head out of my hands, believing the data to be lost forever, I starting thinking about how the recovery tools worked. I figured, if they can do it – why can't I? So, this post is the first of a series on how to undrop SQL Server tables the hard way.

Caveat: This post, and the series of which it forms a part, applies to SQL Server 2005. Your mileage with other versions may vary.

Parsing the Transaction Log

The blurb on the ApexSQL web site mentioned that they use the transaction log to recover dropped tables, so that's what I tried first. SQL Server 2005 has an undocumented user-defined function called fn_dblog within the master database. The function accepts a starting and ending Log Sequence Number (LSN) and returns all entries within the transaction log of the current database between those LSNs. If we pass NULL for both LSNs we get all entries in the transaction log.

SELECT * FROM dbo.fn_dblog(NULL,NULL)

Except we don't quite get all entries. We get all entries since the last CHECKPOINT. I won't detail all the circumstances under which a CHECKPOINT is issued, but from my own experimentation with this function is was clear that a CHECKPOINT had been issued subsequent to me having dropped the tables. As a result the only entries I could see in my transaction log were those relating to the CHECKPOINT operation itself. Perhaps this is why ApexSQL wanted to install those extended stored procedures, so it could access the transaction log directly and look 'through' the CHECKPOINTs. I don't know. All I know is that using the transaction log to undo the effect of all those DROP TABLE statements wasn't going to work. I needed another plan.

Drop Table != Destroy Data

Most of us are familiar with the concept that when you delete a file from disk, nothing actually goes through the file and scrubs out its contents – all that happens is that the space previously occupied by the file is marked as available for re-use. It seemed reasonable to me that SQL Server would work in a similar way – after all, when you drop a table containing a million rows the operation completes almost instantly. If this theory were correct, it would suggest that the data in the tables I'd dropped was still present – the pages upon which that data used to reside had just been marked as available for use.

SQL Server stores details about page usage in the Global Allocation Map (GAM), the Shared Global Allocation Map (SGAM) and the Index Allocation Map (IAM) structures. I reasoned that if I'd dropped a table, the GAM, SGAM and IAM structures would all have been updated. A quick test showed that this was indeed the case. So all I was left with were the data pages themselves.

DBCC PAGE

SQL Server has an undocumented DBCC command, DBCC PAGE, which can return the raw data sitting on a specified page – even if that page doesn't logically belong to any table or internal structure. It accepts four parameters:

  • 'dbname' or dbid – the name or id the database,
  • filenum – the id of the file containing the page,
  • pagenum – the id of the page within that file, and
  • printopt – the level of detail (0 through 3) required in the output.

The values for printopt are interpreted as follows:

  • 0 – output the page header only,
  • 1 – output the page header plus hexadecimal dumps of each row on the page,
  • 2 – output the page header plus a hexadecimal dump of the entire page, or
  • 3 – output the page header plus an interpretation of each row, performed via the meta-data available for the table.

I need to thank Paul S. Randall at this point. His In Recovery... blog, particularly the On-Disk Structures section, and his posts on the SQL Server Storage Engine blog were invaluable in understanding how to use DBCC PAGE and how to understand the structure of a page.

It used to be that you needed to set trace flag 3604 in order to access the output from DBCC PAGE, but these days you can simply append the WITH TABLERESULTS clause. For example, the following statement will provide a fully interpreted dump of the contents of page 16 of file 1 of the master database.

DBCC PAGE ('master', 1, 16, 3) WITH TABLERESULTS

In tests on a separate database I established that if I created a table, added some rows, and then dropped that table I could still see the data via DBCC PAGE. Okay, the data wasn't in a very readable format, but it was clearly still there. So, DBCC PAGE was going to be how I would recover the data. (Incidentally, I found Internals Viewer for SQL Server, written by Danny Gould, to be a very useful tool in these initial stages, primarily to establish which page actually contained my test data and to help understand some of the on-disk structures. The tool is great – I recommend it.)

Now, in terms of the values I could use for DBCC PAGE's printopt parameter, 0 wasn't much use as it provided row header information only and 3 relied upon the table meta-data which had obviously been lost during all those DROP operations. This left me with a printopt of 1 or 2. At the time I didn't pay enough attention to the fact that a printopt of 1 actually splits the page into individual rows – obviously very useful. I'd stupidly scanned through the list and assumed that the level of detail got greater as the printopt value got higher, and so upon realising that 3 was no good to me because of the lack of meta-data, I decided that a printopt of 2 was what I wanted. As a result, I ended up splitting the page into individual rows myself which I really didn't need to do. If you're planning on using any of what I discuss in this series of posts yourself, you're probably better off with a printopt of 1.

I'm going to skip ahead in the timeline a little here because when I came to use DBCC PAGE on the actual database containing the dropped tables (as opposed to the database I was testing this on) I hit a snag, and it makes sense to cover-off that issue now. DBCC PAGE can only be executed by members of the sysadmin fixed server role, and I wasn't a member of that role. I was a dbo on the database itself, but not a sysadmin on the server. Even given the nature of the problem that I was attempting to solve I knew that I wasn't going to be granted membership of sysadmin: there were other databases on the server to which I had no access and the sysadmin role was way too powerful to be granted to the likes of me. Our DBA agreed that if I could create a stored procedure which wrapped the call to DBCC PAGE and ensured it only accessed my database, then he would grant that stored procedure rights to run as sysadmin. So I created one.

Clearly we can't create anything in our 'empty' database for fear of overwriting a data page we're trying to recover, so we need to work elsewhere. In these examples I'll be working in a new database called DataRecovery, but any database on the same server as the 'empty' database will work just as well.

CREATE DATABASE DataRecovery
GO

USE DataRecovery
GO

CREATE PROCEDURE dbo.pDBCC_PAGE_Lazarus
(
  @FileId INT,
  @PageId INT
)
AS
  DBCC PAGE ('Lazarus', @FileId, @PageId, 2) WITH TABLERESULTS
GO

GRANT EXECUTE ON dbo.pDBCC_PAGE_Lazarus TO [SomeDomain\Ian.Picknell]
GO

Clearly the database which 'lost' its tables wasn't really called 'Lazarus' but that's the name I'll use throughout these posts.

WITH EXECUTE AS

The CREATE PROCEDURE statement within SQL Server has an optional WITH EXECUTE AS clause which can be used to set the security context under which to execute the stored procedure. The default is CALLER but SELF can be specified to cause the stored procedure to be executed under the security context of the user who created the stored procedure, rather than the user who called it. Although I could successfully use WITH EXECUTE AS to change the security context for a sample stored procedure, I couldn't get it to work for the DBCC PAGE wrapper – I kept being told that I did not have permission to execute the DBCC PAGE command. Although I couldn't prove that this was the case, I eventually reached the conclusion that as a security context change initiated via WITH EXECUTE AS may not span databases, and as DBCC PAGE can be used to access data on an arbitrary database, all calls to DBCC PAGE are assessed using the security context of the caller (even if the first parameter being passed to DBCC PAGE is actually the current database, meaning no spanning of databases actually occurs).

Using Certificates

An alternative to using WITH EXECUTE AS is to sign the procedure with a certificate and then assign rights to that certificate: the caller gains those rights during the execution of the procedure. The process is slightly convoluted, and you can't actually assign rights directly to the certificate, so let's walk through the steps involved.

First, we need to create a certificate with which to sign pDBCC_PAGE_Lazarus. Note that we protect the certificate's private key with a password, so it's useless to anyone who doesn't know the password. The use of a password here isn't really necessary as the certificate will be rendered very useless in a moment when we actually remove the private key.

CREATE CERTIFICATE cert_pDBCC_PAGE_Lazarus ENCRYPTION BY PASSWORD='AStr0ngPa55word!' WITH SUBJECT='pDBCC_PAGE_GFRM_Lazarus Signing Certificate', START_DATE='09/14/2009'

So now we can sign the stored procedure with the certificate.

ADD SIGNATURE TO dbo.pDBCC_PAGE_Lazarus BY CERTIFICATE cert_pDBCC_PAGE_Lazarus WITH PASSWORD='AStr0ngPa55word!'

Now that we've used the certificate to sign the stored procedure we can actually remove the certificate's private key. This renders the certificate impotent from the perspective of signing further stored procedures. Remember that we're going to ask our DBA to assign sysadmin rights to the certificate, so we need to ensure that the certificate can't be used to sign other arbitrary stored procedures.

ALTER CERTIFICATE cert_pDBCC_PAGE_Lazarus REMOVE PRIVATE KEY

Despite what I implied earlier, rights cannot actually be assigned to certificates – they can only be assigned to logins. So SQL Server makes us create a new login from the certificate and then assign rights to that login. The login can't actually be used to login to the server. As the documentation puts it: "Logins created from certificates or asymmetric keys are used only for code signing. They cannot be used to connect to SQL Server." It then goes on to say: "You can create a login from a certificate or asymmetric key only when the certificate or asymmetric key already exists in master." That's unfortunate – we can only create a certificate-based login if that certificate exists in the master database. So we're going to need to copy our certificate over from DataRecovery to master. Oddly, although you can copy certificates between databases you have to do so by exporting them to disk from one database and the importing them from disk into the other database.

  BACKUP CERTIFICATE cert_pDBCC_PAGE_Lazarus TO FILE='cert_pDBCC_PAGE_Lazarus.cer'

By default, the certificate will be backed-up to the DATA directory (typically C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data or similar).

This is the point at which we need to hand over the reins to our DBA. First, we need to ask our DBA to import the certificate from disk into the master database.

USE master
GO

CREATE CERTIFICATE cert_pDBCC_PAGE_Lazarus FROM FILE='cert_pDBCC_PAGE_Lazarus.cer'

Then we ask the DBA to create a login from that certificate. Again, this login can't be used to actually login – it's just used to connect a certificate to a role.

CREATE LOGIN cert_pDBCC_PAGE_Lazarus FROM CERTIFICATE cert_pDBCC_PAGE_Lazarus

Now that we've got our certificate-based login, we can ask our DBA to make it a member of the sysadmin fixed server role.

EXEC sp_addsrvrolemember 'cert_pDBCC_PAGE_Lazarus', 'sysadmin'

Summary

The net result of all the above is that we now have a stored procedure, pDBCC_PAGE_Lazarus, which anyone can call to execute a DBCC PAGE command against a specified page in a specified file on the 'Lazarus' database. In my next post I'll describe how we can use this procedure to start the process of recovering our dropped tables.

See Also

No comments:

Post a Comment