Robin Lee Powell's OLD Backup Method

THIS IS OBSOLETE.

Please see my new backup method.

Introduction

I have a method of personal data backup that seems to be fairly novel, in that it has all of the following properties:

None of these things are especially exceptional, but the combination of all of them is rare enough that in searching for a good solution I've found a number of people also looking for this feature set (and largely failing to find it, by the way). It's actually the multiple OS part that makes it most complicated; backing up just a Linux box is just a matter of finding a good Amazon S3 tool.

The problem with multi-OS backups is that you need a solution that preserves all the metadata that different OSes and file systems have (in particular: NTFS file permissions are far better in most respects, and also far more complex, than Linux file permissions, unless you getfacl/setfacl, which no-one ever does, and even then NTFS is still better in some respects, like cascading permissions). The problem with the backup solution I use, BackupPC, is that it likes hard links. A lot. No non-UNIX file system understands these and, in particular, it means Amazon's S3 is right out. I'm not aware of any Linux multi-OS backup system that doesn't suck that also doesn't have this problem; that is, such systems make few assumptions about the file system they're backing up, but a lot of assumptions about the file system they're backing up to. The big advantage to BackupPC's hard link system is that it minimizes the size of each new backup, whilst still retaining the enterprise-level tiered backup system we've all come to know and, umm, tolerate.

The final problem, and actually the worst, is that there exists no software that I'm aware of that will do all of the following things:

When, like me, you dig into this problem because you're stupid enough to try writing such software yourself, you discover that this problem is Surprisingly Hard. The reason is that last step: update detection. By definition, non-broken ciphertext looks random; it certainly doesn't in any way resemble the original file. Date-based update detection is totally unacceptable, because it doesn't deal with the possibility of corruption during transit; if that happens on the first run, you want the second run to correct it. The solution I was working on was rather grotesque, and I quit when I realized that BackupPC relied on hard links, which I had no reasonable way to support.

The rest of this page is about how I solved these problems.

The Machines

The Software

The Process

Set Up The File System

Make the file for the sparse file system; this command may not be exactly right, as I stupidly didn't make notes when I did this:

dd if=/dev/zero of=SPARSE_FILE bs=1M seek=32768 count=1

This creates a 32GiB sparse file; "du -sk SPARSE_FILE" should report a 1MiB file and "du -sk --apparent-size SPARSE_FILE" should report a 32MiB file.

Create the file system key:

umask 077
head -c 2925 /dev/random | uuencode -m - | head -n 66 | tail -n 65 \
    | gpg --s2k-count 8388608 --symmetric -a >FS_KEY

Use a very, very long passphrase; I suggest over a hundred characters. Ideally use the output of "makepasswd --chars=128". The reason is that this will be a script managed file system; you should never, ever type in this password.

Create the actual file system:

losetup -e AES128 -K FS_KEY /dev/loop0 SPARSE_FILE
mke2fs -j /dev/loop0
losetup -d /dev/loop0

Add it to fstab:

# To fsck /backups:
#  losetup -e AES128 -K FS_KEY /dev/loop6 SPARSE_FILE
#  e2fsck /dev/loop6
SPARSE_FILE  /backups    ext2    defaults,ro,noauto,loop,encryption=AES128,gpgkey=FS_KEY     0       0

And mount:

mount /backups
[paste in the password]

You should now be able to tool around in /backups; it's a regular file system, just backed by a single file.

Synchronize BackupPC Data

I have this in my root crontab on LOCAL:

# Daily backups, but try to avoid backuppc run times
12 12 * * *     /usr/bin/nice /home/zroot/backup_fs_based && /bin/cat /tmp/backup_fs_based.log

Here's the backup_fs_based script as of 5 Oct 2007, with some privacy modifications:

#!/bin/sh

export PATH=/bin:/usr/bin:/sbin:/usr/sbin

# After 5 hours, assume the other rsync is dead or something.
lockfile -l 18000 -r 1 /var/tmp/lock.backup_fs_based

if [ $? -ne 0 ]
then
	echo "Could not obtain lock; exiting."
	exit
fi

exec 1>/tmp/backup_fs_based.log
exec 2>&1

echo ; echo -n "date 1: " ; date ; echo

# Stop backuppc
/etc/init.d/backuppc stop

# mount our local copy of the encrypted file system
echo THE_PASSWORD | mount -o rw,loop /backups -p -

echo ; echo -n "date 2: " ; date ; echo

# rsync backuppc to our local encrypted file system; -H is *VERY IMPORTANT*; -S almost as much
rsync --delete -S -H --no-whole-file --stats -av /var/lib/backuppc /backups/ >/tmp/backup_fs_rsync.log

# umount so the file is stable
umount /backups
sleep 5
fuser -k /backups
umount /backups

# Start backuppc
/etc/init.d/backuppc start

echo ; echo -n "date 3: " ; date ; echo

# Move things to the sync dir
cd /var/tmp/SYNC_DIR

# Put the backuppc configs in there.
tar -zcf /var/tmp/SYNC_DIR/etc_backuppc.tgz /etc/backuppc/

# Put a sparse-aware tar of the backup itself there
tar -S -cvf /var/tmp/SYNC_DIR/backup_fs.tar SPARSE_FILE

echo ; echo -n "date 4: " ; date ; echo

# rsync to the remote system
#
# The problem with --inplace is that the offsite backup might be corrupted.  That's *BAD*
rsync --delete -H -S --no-whole-file --stats -av /var/tmp/SYNC_DIR/ REMOTE:/var/tmp/SYNC_DIR >/tmp/backup_fs_rsync2.log

echo ; echo -n "date 5: " ; date ; echo

# Remove the lockfile
rm -f /var/tmp/lock.backup_fs_based

echo "


*******************
      REPORT
*******************

"

echo "Line count from first rsync: " ; wc -l /tmp/backup_fs_rsync.log
echo "Bottom bits of first rsync: " ; tail -20 /tmp/backup_fs_rsync.log

echo "Line count from second rsync: " ; wc -l /tmp/backup_fs_rsync2.log
echo "Bottom bits of second rsync: " ; tail -20 /tmp/backup_fs_rsync2.log

What that script does:

If you've been thinking about this, you may be saying "Why not just tar and then encrypt?". The answer is, in fact, the crux of how this system works: encryption of a file is guaranteed to randomize the whole file. This means the rsync algorithm is useless, so you transmit the whole file each time. My upstream is 768Kbits/s, and I have ~16GiB of backups, so that's Right Out.

OTOH, loop-AES can't re-encrypt an entire file system every time so it must, of necessity, be block based. The means that only the changed blocks in the file-backed filesystem are modified, and tar (without encryption anyways) is entirely localized deterministic: if you change one block, you'll change only a small portion of the resulting file. As a result, my nightly backups look like this:

sent 1210514 bytes  received 660073298 bytes  32639.07 bytes/sec
total size is 22886762182  speedup is 34.61

and take about 6 hours (instead of about 2 weeks).

It's worth noting that this isn't actually how my offsite backups work; that script backs up to another one of my local servers. The offsite backup is actually pulled from the remote side using a key-based ssh login. In addition, I use logrotate to backup the remote backup. The details there, if you want to do similar things, are left as an exercise to the reader, but here's the actual rsync I use for the pull:

/usr/bin/rsync -e "ssh -i $HOME/.ssh/get_backup" --delete -H -S \
--no-whole-file --stats -av \
rlpowell@digitalkingdom.org:/var/tmp/SYNC_DIR/ /var/tmp/SYNC_DIR/

Update: Mon Dec 17 16:09:05 PST 2007. I now use trickle in my remote backups, thanks to Tene (a friend of mine), to limit bandwidth usage. rsync gets a bit upset with how trickle does its rateshaping, but it can be dealt with. Here's a shell snippet for it, that limits to 25KiB/s in both directions:

# trickle upsets rsync, so we keep trying until it works; once past
# the initial setup phase it seems to continue OK
while [ 1 ]
do
    trickle -t 5 -l 50 -s -d 25 -u 25 \
    /usr/bin/rsync -e "ssh -i $HOME/.ssh/get_backup" \
    --delete -H -S --no-whole-file --stats \
    -av rlpowell@digitalkingdom.org:/var/tmp/SYNC_DIR/ \
    /var/tmp/SYNC_DIR/ && break

    # Test for trickle-based rsync failure
    if [ $? -ne 12 -a $? -ne 255 ]
    then
	break
    fi
    sleep 1
done