Designing a Personal Backup Solution

I read Scott’s post a while ago on the Backup Rule of Three and I’ve managed to get to a level of redundancy that I’m not scared of the house burning down. But in hindsight, I didn’t really plan much how I would implement backups, I just added a cloud storage service, tried to centralise my data for easier backups, and threw in leaving an encrypted drive at work as off-site backup.

The announcement of Dropbox’s new 1TB plan really got me thinking about this again, as I should be constantly evaluating how much my current services are costing me and whether I’m getting the most out of them. So I’ve formalised a process for deciding on how to structure my backups, so I can make it easier to swap services out and still keep my data protected to the level I want it to be.

The 3-Step Personal Backup Planning Process

Step 1: Data Classification

This bit is easiest. Simply write down the types of data you have. All of them. This is what I came up with for myself:

  • Passwords (and anything else authentication related, like SSH keys)
  • Crypto Currency Wallets
  • Identity Information (Copies of Driver’s Licence, Passport)
  • Financial Information (Credit Card Information, Banking Information)
  • Source Code (Personal Projects)
  • Personal Records
  • Photos & Home Videos
  • System Images
  • Music
  • Movies
  • TV Shows
  • Games

Step 2: Grouping

Now you need to cluster the data into groups of importance. Consider how much of an inconvenience it would be to try to recover it or how much it would disrupt your life.

Group 1: Severe if Lost. Identity & Finances at stake.

This is data that, if you lost the physical copies or your electronic copies were stolen, then you would be in a difficult situation. For me it’s:

  • Passwords
  • Crypto Currency Wallets
  • Identity Information
  • Financial Information
  • Source Code
  • Personal Records

Group 2: Annoying to Lose. Would take some time to recover.

This is stuff that wouldn’t cause issues if lost or stolen, but you wouldn’t be too happy about it nor would you want to go through the process of recovering it all. For me it’s:

  • Photos & Home Videos (maybe when I have kids this’ll get pushed up to Group 1)
  • System Images
  • Music (this is borderline Group 3, but it holds more significance for me)

Group 3: Inconvenient, but can be rebuilt slowly over time again.

This is the place for everything else. Stuff that you can live without, and would honestly be just a burden to try to back up regularly. For me it’s:

  • Movies
  • TV Shows
  • Games

Step 3: Picking 3 Locations to store the data for the first 2 groups.

Now, when it comes to backup of the groups above, you can disregard the third group. Just focus on the first two groups to keep things simple. Maybe have some sort of RAID setup on your machine to make sure there’s some redundancy on-site for group 3.

The first Location that I keep my data on is my on-site copy, which is a 2TB External Hard Drive (one of these). My advice is to get a self-powered drive, nobody wants to lug an extra power cable around. Both Group 1 and 2 data go on this drive together. All together it amasses to around 1.2TB, so I’ve got a fair bit of wiggle room.

I have another 2 of these Hard Drives. One is left at the office as my off-site copy, and the third just acts as a transport mechanism between the two and my on-the-go copy. I also have the third in case of a drive failure.

I encrypt these drives with Bitlocker, so if I lose them I’m not fussed (except for having to buy another one). I’m predominantly a Windows guy, so this works out OK. I considered TrueCrypt as a way to save myself from the “rumours” of a “Microsoft escrow key”, but it’s really too annoying to deal with a multi-step process of mounting the drive. I prefer to have the OS integration instead and suck up the risk.

So that covers on-site and off-site, but what about the cloud? We are following the Backup Rule of Three after all.

This is where it gets a bit harder. Cloud storage of the data in Group 1 should not be taken lightly from a security perspective. There are a number of threats against it that you really need to keep yourself safe from. What I have come up with is the following:

Group 1: SpiderOak. I chose SpiderOak as it’s a zero-knowledge cloud provider. This means no-one, not even the provider, can see your data as the keys protecting your data don’t leave your computer. If you use the “Hive” capability (Dropbox-like functionality), then each client will negotiate to transport the keys amongst one-another, rather then having the server store them.

The detractor to this is that you’ll lose your data if you forget your password, so make sure you have your password stored somewhere else just in case.

And, because of this extra layer of security on top, the application isn’t as stable as other Cloud storage tools like Dropbox. I’ve had a few occasions where it just gets stuck on a single file, or it has a 10 minute or more delay to push files around. But I accept this as the price of stronger security.

I’m running the free version of the product. By default it gives you 2GB of storage, but I found a code online to unlock an extra 4GB of space. So far it’s sufficient for my Group 1 data. I’m betting that the free plan will be extended soon, with the way the industry is going.

Group 2: Crashplan. Just like SpiderOak, Crashplan offers zero-knowledge capability as well. I have defined my own encryption key for each of the machines I have it installed on, and as such the service provider cannot decrypt the files if I lose the key.

But the best part is they give you unlimited storage. It’s a bit more than I had hoped to pay overall, but the unlimited storage capability won me over.

All I had to do was install the agent onto each of my machines (including my mac), and select the folders for it to back up. Because the key is defined in the agent application, I have also set them to require my password to access the admin interface. A nice extra layer of protection.

Probably my only problem with it is it idles at 400MB of RAM, which is nearly 1/4th of the RAM on my server.

Threats

As I mentioned earlier, there are a number of threats to your data. I tried to think of a couple that apply to a home user, and this is by no means a complete list. It’s more of a starting point.

Threat 1: Account Hijack

This is somebody using phishing, social engineering or plain-old hash cracking to get into one of your online accounts.

If this happened, what would they be able to get to?

What information of yours could they capture? Could they download your backups?

I believe I’m mostly covered when it comes to this threat. I use a password manager to ensure all of my passwords are different across the services, and they’re over 20 characters and sufficiently random. If somebody got my CrashPlan account credentials, they wouldn’t be able to download the data as they don’t have the decryption keys, and all of the keys are different for each machine. The best they could do is a denial of service I guess.

When it comes to SpiderOak though, unfortunately it still doesn’t offer Multi-Factor Authentication for the service. I think I may need to look at alternatives. Good thing this blog post is getting me thinking about it.

Threat 2: Unauthorised Access to Home Network

Everything may look good against external threats, but have you considered infected devices joining your home network? Or a malicious neighbour that spends their time cracking your wifi key to snoop around on your network?

To protect myself against this I’ve employed some fairly strong restrictions on content available on my home network. The machines are all protected with strong passwords and patched constantly. I’m considering making a Domain Controller too, to replace logon with Kerberos Authentication as it’s much safer.

Threat 3: Device Theft/Loss

What would happen if your external drive was stolen?

What about your whole computer?

To protect myself against this threat, I’ve run Bitlocker on every machine and drive (and FileVault on my mac), so if anything is powered down then it can’t be started again without a key. I also recommend not using TPM-based Bitlocker. It seems stupid to go against better storage of encryption keys, but if the boot loader in the PC is not changed, the machine will boot, giving somebody the opportunity to insert a device with some exploit to gain access to your system.

I prefer to rely on a really strong password and put that in during boot. I protect access to that password instead.

You can go the next level and use TrueCrypt, but I’m not that keen.

It’s best to not store Bitlocker recovery keys on flash drives near your computer. If somebody actually knows what they’re doing they’ll take those as well to be sure they’ve got the decryption keys as well.

Threat 4: Device Failure

Have you planned for a drive dying?

What about the Cloud Provider accidentally deleting your content?

It happens. Drives fail all the time. I recall an article about a guy who’s Gmail account was accidentaly deleted, and Google couldn’t recover it for him. You can’t trust that these providers will do their jobs perfectly, especially if any of them are free services.

That’s why we use the rule of three for our backup strategy. Problems with your provider? Just swap it out with another. If you’ve followed this process then you’ve got nothing to worry about.