I have a simple need. I want to use rsync to copy various directories on a root server from 1&1 to my Mac. I set all this up this before, but a couple of days ago, the root server refused to reboot. After a lot of tinkering (and swearing) using the recovery systems and FAQs supplied by 1&1, I couldn’t fix it (also, see “An Aside” below), so re-imaged the entire machine. This completely rewrites the box, so the backup must be set up all over again. Setting up an rsync backup turns out to be more difficult than it needs to be, requiring a number of additional steps. Naturally, I didn’t write these down last time, so had to rediscover both the problems and the solutions again. This time, I’m recording them here so that a) I can find them when I need them again and b) on the off chance that they might help somebody.

I should mention that I am not a Linux guru by any stretch of the imagination. What follows will probably bore real Linux geeks to tears. Or, maybe just make them chuckle at my incompetence. Some or all of the following could be wrong or a bad idea. If so, please leave some comments (that’s the third reason I’m posting this).

The Problem

Using rsync to backup a machine makes backups much more quickly than using, say SFTP. This is because rsync only copies things that have changed since the last time you ran it. So, the initial backup pulls everything, but after that each time it runs, only the small number of altered files are transmitted over the network. This requires rsync software on both the source and destination machine, as they communicate to decide if a file needs transmitted, and this is where my problem is.

I re-imaged the server using a 1&1 supplied image that sets up the box using a Fedora distribution, all set up to use Plesk and the other standard bits that 1&1 provides. Unfortunately, rsync is not one of those bits, for reasons that are not clear. It is possible that 1&1 wants you to buy backup services from them instead. So, you need to install rsync on the server.

The “Fedora way” of installing software is to use a tool called yum, which “automatically computes dependencies and figures out what things should occur to install packages”. This tool is included in the 1&1 Fedora image, so the following command (as root) should do the trick:

yum install rsync

And here is where things get ugly. While the yum program is present, it’s configuration is messed up:

# yum install rsync
Setting up Install Process
Setting up repositories
core                      100% |=========================| 1.1 kB    00:00
http://update.onlinehome-server.info/distribution/fedora/linux/core/updates/6/x86_64/repodata/repomd.xml: [Errno 12] Timeout: 
Trying other mirror.
Error: Cannot open/read repomd.xml file for repository: updates-released

At this point, I could just manually install rsync and call it a day, but I really would like yum to be working for some other reasons. The error indicates a URL timeout, which means that yum is likely trying to contact a site that isn’t actually there. So, an obvious thing to try is changing yum to point to a different server.

Yum uses two main configuration concepts: a /etc/yum.conf file and a number of files in a /etc/yum.repos.d directory. A quick grep onlinehome /etc/yum.* shows the bad URL is in /etc/yum.conf. Looking at all the other repos in /etc/yum.repos.d, it isn’t clear if the two repos in /etc/yum.conf are even needed. They appear to be 1&1 specific, pointing to a server that 1&1 doesn’t seem to be paying attention to. Certainly, for rsync, they are probably not needed. So, let’s try telling yum to ignore those two. According to the config file, they are named “base” and “updates-released”. A look at the man page and try this:

yum clean all
yum --disablerepo=updates-released,base install rsync

This seems to work like a charm. So, the source for the backup should be ready to go.

There may also be another solution. The update server that 1&1 uses is inside the same firewall as the root server, so can’t be seen from the internet. Even from the root server, it appears that, by default, the root server can only get to the update server using ftp, not http. This is why yum times out when trying to connect to it. It could be that altering the config to use ftp URLs would work.

I have no idea why the install images provided by 1&1 are configured in such a way that they don’t actually function. I’ve sent them mail about this, but they have not replied.

The Destination

The destination of the backup information is a disk on my Mac, which is running Leopard. Rsync comes with Mac OS X, so should already be ready to go. I have set up a “webbackup” script, tailored to the specific sites I want backed up, and I was running this at noon each day as a cron job.

Or, I was. Until I installed Leopard.

Unbeknownst to me, doing an “upgrade” install of Leopard empties out the crontabs of all users, stashing copies in /var/cron/tabs. This deactivates your jobs without warning. This means that the webbackup I thought I had actually hadn’t been updated for several weeks. Fortunately, I managed to suck down a copy of my web folders and the MySQL data folder the hard way (see “An Aside”, below) before re-imaging everything.

Anyway, my backup script looks similar to the following:


# The full path to the place on the local machine that will hold the backup
DEST="/Volumes/Backup Disk"

# The full path to the place on the local machine that will hold database backups

# The name of the target domain. This name is used both to connect to the
# target server, as well as the name of a directory created in ${DEST} to
# hold the backup data.

# The username and server that are used to log into the target machine.

# The path to the directory on the target that will get echoed to the local.
# If you use a relative path here (no starting slash), it will be relative to 
# the home directory on the target machine. So, if you leave this empty,
# if will suck down the whole target directory. You can also use absolute
# paths.

mkdir "${DEST}"
mkdir "${DEST}/${DOMAIN}"
mkdir "${SQLDEST}"

/usr/bin/rsync --recursive -t --delete --ignore-errors -e 'ssh' ${USER}:${USERPATH} "${DEST}/${DOMAIN}/"

# For each database, do the following
MYSQLDUMP="mysqldump --opt --quote-names -u dbuser --password=dbpassword database_name"
/usr/bin/ssh ${USER} "${MYSQLDUMP}" > "${SQLDEST}/tmp.sql"
if [ -s "${SQLDEST}/tmp.sql" ]; then
   mv "${SQLDEST}/tmp.sql" "${SQLDEST}/database_name.sql"

Being intended for use on Macs, this script should work even for file paths containing spaces. It would use a lot fewer quote characters if you didn’t need to worry about that. You should be able to adjust this script to add additional database backups, extra domains, etc.

This script uses ssh, as does rsync. So, when you run it, most likely you will get asked to enter your password several times. This is irritating and, if the idea is to have this happen automatically, problematic.

It is possible to set up ssh such that keys can be shared between the target and local machines, using them for validation instead of a password. This is less secure, because any ssh connection from the same local user to that user/target combination will automatically connect without a password. If you are away from your machine while logged in, this can be a bad breach.

I create a special “backup” user on my Mac to do this kind of thing. This user has limited rights to the rest of the machine, and serves only the purpose of backing up stuff. Since I am almost never logged in as this user, it minimizes the threat of me accidentally leaving the machine still logged in has “backup”.

Once this is done, try running the webbackup script from the local machine by hand a few times. Once it works the way you want, put the script somewhere (referred to here as /path/to/webbackup. To add it into cron, you need to add an entry to your crontab. Log into the local machine account you will use for backing up and get the crontab into editable mode using crontab -e. The command I use backs up every day at noon and looks something like this:

0 12 * * * /path/to/webbackup >> /path/to/webbackuplog.txt 2>&1

An Aside

I mentioned at the start that I tried using 1&1’s recovery tools. These boot the system from a minimal image, rather than the OS on the box, allowing you to rummage around the box. This allowed me to suck the data that hadn’t been backed up off the machine before I re-imaged it, which saved my butt. Doing this requires that you manually mount the machines disks. They provide instructions on how to do this, but as of the writing of this post, these are now out of date. Their servers now use a RAID in mirrored mode (a.k.a. RAID 1), which can’t be mounted following their instructions. Following their document, your mount commands return errors saying the device is “already mounted or /mnt busy”. This error message is even semi-true. What seems to be happening is that the RAID is marking these drives as “in use” but the whole RAID is not mounted. So, you need to mount the RAID. This is similar to their instructions, but uses different devices. A forum entry suggested the command cat /proc/mdstat to display the names of the raid devices. In my case, these were /dev/md1 and the like. It turned out that these were set up with similar mappings to those described in the 1&1 instructions, so similar mount commands worked. The file systems were also autodetected, which helps:

rescue:~> mkdir /mnt/raid
rescue:~> mount /dev/mda1 /mnt/raid
rescue:~> mount /dev/hda5 /mnt/raid/usr 
rescue:~> mount /dev/hda6 /mnt/raid/home 
rescue:~> mount /dev/hda7 /mnt/raid/var
rescue:~> ls /mnt/raid/var/www/vhosts

Once you have these drives mounted, you should be able to use scp to suck the data you need off the machine, at the very least. Ideally, you should also be able to alter whatever files caused the problem that necessitated the recovery tools. In my case, the problem seemed to occur out of the blue, not from messing with config files or something, so after a few attempts to figure out what was going on, I just nuked the system.