Tim's Blog

Information, Technology, Security, and other stuff.

A Personal Backup Solution Updated

Published 2016-12-07

Since moving most of my homelab services to docker I decided to re-evaluate (again) how I perform backups.

With containers hosting most of my services I really don't care what happens to the actual container or the host, I just care about the persistent data that is created. I've taken this same approach to the rest of my data now to simplify my life. Simplicity === bliss.

I used to perform full backups of my machines. I'll still do that for my primary domain controller (I don't want to have to rebuild my domain), but everything else will just consist of backups of persistent data only. I'll cover each of the types of backups I do under my new regime in this post. It's worth mentioning too that I have a NAS of sorts at home (HP ProLiant N40L running Server 2012 R2), which is the local destination for all of these backups. If you don't have a NAS, I recommend trying to build one, buy one, or just designate some machine that's always on in your environment to be the backup hub.

My standard approach across the board is:

  1. 7zip up the necessary files with encryption enabled
  2. Save the 7zip archive to my NAS
  3. Upload the 7zip archive to Google Drive as well
  4. Clean up

7zip is my primary backup storage method because:

  • it's cross-platform,
  • it compresses really well,
  • it allows encryption of archives (with AES-256),
  • it has a fully featured CLI, and
  • (the best part) it's free!

The drives that I store the backups on are all encrypted with Bitlocker, but it's safest to ensure those backup files are also encrypted, in case they end up in the wrong hands while the drive is in an unlocked state.

Docker Services

I went through this a bit in a previous post, but this is where I'm at now with backing it up. Because all configuration and persistent data is managed within a single directory, I simply zip that up and push it up to my local NAS first and then to Google Drive:

#!/bin/bash

cd /var/docker/compose

# stop the containers
docker-compose stop

# zip them up encrypted
7za a -t7z -mx2 -mhe=on -pAReallyGoodPasswordStoredSomewhereSafe /tmp/docker1-backup-$(date '+%Y-%m-%d').7z /var/docker/

# start the containers
docker-compose start

# copy to local archive
cp /tmp/docker1-backup-$(date '+%Y-%m-%d').7z /mnt/backups/

# upload to google
gdrive upload --parent AhgfjkhfwdaklbaHASFHASFhqe90huasHFQJah9qehqAJHQ --delete /tmp/docker1-backup-$(date '+%Y-%m-%d').7z

exit 0

Plex

I have Plex running in it's own VM. Not because I can't containerise it, but because the docker images for Plex don't include the Plex Pass version and I'm not particularly interested in defining my own container. For now it'll live in its own world, and it's effectively the same sort of backup as the docker machine:

#!/bin/bash

# stop plex
systemctl stop plexmediaserver

# back it up
7za a -t7z -mx2 -mhe=on -pSeriouslyYouBetterBeUsingAStrongPassword /mnt/backups/plex-backup-$(date '+%Y-%m-%d').7z /var/lib/plexmediaserver/

# start plex
systemctl start plexmediaserver

exit 0

But as you can see I omit the step of sending it to Google Drive. It's not that big of a deal to me if I lost these backups, they're more of a convenience than anything.

Confluence

I've covered off my confluence backup procedure before, but here it is simplified with 7zip instead. It's just like the Plex backup but an extra step added for backing up the database with pg_dump:

#!/bin/bash

# stop confluence to prevent corruption
systemctl stop confluence

# back up confluence
pg_dump confluence -U confluenceadmin -h localhost > /tmp/confluence.sql

7za a -t7z -mx2 -mhe=on -pIShouldntHaveToTellYouAgainToMakeThisAStrongPassword /tmp/confluence-backup-$(date '+%Y-%m-%d').7z /var/atlassian/application-data/confluence/ /tmp/confluence.sql
rm -f /tmp/confluence.sql

# start confluence
systemctl start confluence

# copy to local backup
cp /tmp/confluence-backup-$(date '+%Y-%m-%d').7z /mnt/backups/

# upload to google drive
gdrive upload -p AhgfjkhfwdaklbaHASFHASFhqe90huasHFQJah9qehqAJHQ --delete /tmp/confluence-backup-$(date '+%Y-%m-%d').7z

exit 0

This runs 3 times a week, as each backup is only like 2GB in size and the changes occur fairly frequently.

Personal Data

Now I used to use a combination of Google Drive and Boxcryptor to store all of my personal data like receipts, document scans, personal projects, etc. I found though after a few months of setting it up that I basically stopped using Boxcryptor entirely as I'd moved anything useful to my Confluence instance, so I decided to ditch Boxcryptor to save a bit of money.

I still like Boxcryptor as a product and would still recommend it, but it just didn't fit in my workflow, so I migrated everything out to a private shared folder on my NAS, and now I just back it up on the NAS with PowerShell:

$date = Get-Date -Format "yyyy-MM-dd"
$backupDirectory = 'D:\Personal\Tim'
$backupFilename = 'D:\Personal\tim-backup-' + $date + '.7z'

7z a -t7z -mx2 -mhe=on $backupFilename $backupDirectory -pYupItsAnotherStrongPassword 

# copy to external drive
Copy-Item $backupFilename E:\

# upload to google
gdrive upload --parent AhgfjkhfwdaklbaHASFHASFhqe90huasHFQJah9qehqAJHQ --delete $backupFilename

This backup happens once a week, because it's around 15GB in size and doesn't change much. You might also notice that 7zip is invoked a little differently; that's just what happened after I installed it with Chocolatey on Windows.

Managing the Backups

Now these backups all fall into a shared folder on my NAS aptly titled "Archive". I need to make sure these backups get stored off-site though, so I have a script that runs every 15 mins to move anything in that folder over to an external hard drive:

$numFiles = ((Get-ChildItem D:\Share\Archive | Measure-Object -Property Length -Sum).Count)

if ($numFiles -gt 0)
{
    Get-ChildItem D:\Share\Archive | select -First 1 | Move-Item -Destination E:\ -Force
}

External drives of course have limited capacity (mine vary between 1 and 2TB), so I have another PowerShell script that runs every 15 mins (offset by 5 mins to the previous script) that just deletes the oldest file if the drive gets to less than 100GB of free space:

# Get the free space remaining on the drive
$freeSpace = Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='E:'" | Select-Object FreeSpace

# If we have less than 100GB of space left
if (((($freeSpace.FreeSpace / 1024) / 1024) / 1024) -lt 100)
{
    # Delete the oldest item on the disk
    Get-ChildItem E:\ | sort lastwritetime | select -First 1 | Remove-Item -Force
}

With 100GB buffer I can be certain any newly created backup isn't going to get stopped by a full disk, and the disks will float around 90-95% full.

These PowerShell scripts won't cross paths with any files that are being worked on as those operations lock access to moving/deleting them, so it's been successful for me so far to just leave them running periodically as opposed to incorporated with the backup scripts.

That's It

It's pretty simple, but I'm liking the simplicity. The external backup drive I have will just keep having new data added to it and purging the old stuff. Whenever I think of it I swap it out with a different drive to keep one off-site, adhering as much as I can to the backup rule of 3.

I am lucky enough to have an unlimited data internet connection at home (yay for FTTP NBN), so I'm not worried about pushing gigs of data up to the cloud, but if you're in a more data-constrained environment (i.e. the rest of Australia, Canada, rural USA, etc.) then you can limit the cloud uploads to something less frequent. You could also omit them entirely and be a little more frequent in swapping out the external drive plugged into your NAS, but then you're at a bit more of a risk of losing some data.