SQL Server Differential Backups and Primary Key choice!

I’ve just set up a weekly full database backup and daily differential daily backups thinking that this would not only speed the daily process up but also reduce the amount of data that’s flying around the servers since our backups are topping 85gigs. However, I was disappointed to notice that the first differential backup containing only a days worth of transactions is already 27gigs even though the quantity of transactions is a lot less.

A quick google and a ‘eureka’ moment seems to have highlighted the problem. A differential backup identifies the data that has changed since the last backup but backs up the entire Extent that the data exists in. (An Extent is 64kb and contains 8 Pages where each Page is 8kb.) That would have been fine except the primary keys we are using are generated as unique guids – which means any data added to a table during the day is randomly, and probably evenly, distributed across the Extents. That’s the ‘eureka’ moment… the differential backup is identifying sparsely updated Extents and so artificially inflating the backup.

Along with the other performance problems we’ve had in the past this is lending more weight to using something along the lines of newsequentialid() function but there’s more details blogged by Jimmy Nilsson which details some alternatives.