Scripted backups.

Many people are struggling with how to deal with their backups. Normally backups are a problem handled by some administrator who copies a whole bunch of stuff onto a tape cassette. Often without knowing how, if even possible to restore everything from the data on the tape cassette.

I approached this at a different angle.

As developers we needed a staging environment for the application we are developing. Using the live servers is not okay in my opinion. So I found an old P4 box with 1GB ram. Slapped in a new HDD for the databases and started setting this up as a development server. Of course, porting the code to make it mobile enough to move it to this server was one thing, which is another story. But getting concurrent versions of the databases onto this server was were the backups came into play.

Using the development server as a test case for the backups, I had everything I needed to build a true case where I could assure that each and every backup taken would work. Each backup is restored to the development server and as we developers use the development server, we see if some data goes terribly wrong. Of course, this does not protect us from the scenario where some data is malformed in a uniformed way, producing false positives. However, all data which goes into our system is backed up separately, so everything can be rebuilt if needed. This data is checked via md5 hashes, so no malformation is possible there.

So, the backups are taken to a backup server, custom built for the purpose. This server is an office version. We wanted an out-of-the-server-park backup, so we decided to put a server at our office. The box is custom built by myself and based on a Chieftec chassis with an Intel motherboard and Intel processor. I used a highpoint rocketraid 3210 as a RAID controller. I built a RAID5 from four Western Digital 1TB hard drives. We also have two spares in the server, so in case we need to rebuild the RAID we do not have to wait for a delivery.

A word or two on security.

When doing cross server backup transfers, you must always make sure you know your security priorities. In my case I trust the backup server far more than the live servers. This is often how the case should be in my opinion. The backup servers are exposed to no one but the live servers and the developers. and the live servers are exposed to the whole world. Thus, the more likely first target of an attack would be the live servers. If the backup servers trust the live servers, and the live servers are compromised, the backup server will be compromised as well. This is why we build the backup scripts on the backup server and make them fetch the data from the live servers, rather than having the live servers push the data to the backup server.

The backup scripts are in our case built in php, so all developers at this  company can understand them. I would rather have built them in straight bash or perhaps python, but the advantage of more people being able to understand all code is far greater than having used “that cool programming language”. Of course the php executes complex shell commands to do most of the actual work, but the flow of the program is entirely php.

So, how do you set all this up?

The way to set up these kind of scripts are with ssh certificates, crontab and some basic php code.

First you need to set up a user on the live servers who will have only the privileges of reading the files/databases which needs to be backed up. I’m calling this user ‘backup’ for easy reference.
the adduser command is slightly different in many distros, so read up on it if you are unsure.

man adduser

Secondly you need to give the backup user on your live server the public key of the backup user on your backup server. So log on to your backup server as your backup user and type:

ssh-keygen -t rsa

a file “id_rsa.pub” should be generated in the .ssh directory in your home directory (~/.ssh/id_rsa.pub). This file contains one(1) line and this line is this user at this machines public key. It identifies this user at this machine. What you need to do now is to transfer this line into the live servers’ authorized keys file. (~/.ssh/authorized_keys). This file may not even exist on the live servers, but if it does, make sure only to add your public key line to the end of this file. Do not overwrite. The magical unix command for doing this would be:

cat ~/.ssh/id_rsa.pub | ssh backup@your.liveserver “cat >> ~/.ssh/authorized_keys”

of course, you should double check these kind of things manually and not rely on a copy-pasted command from blogs like mine. The command above does give you a hint on how the backups are transferred though.
After you have put the backup user of the backup servers public key into the authorized keys file of the backup user on the live server, you should now be able to log in from the backup server as the backup user on the live server as the backup user, without being prompted for a password. If this is not the case, you’ve done something wrong. Start from the beginning.

So, now we need to start writing the actual script which controls the flow of the backup process. I will now write an explanation which can be applied into any kind of programming language.

First thing we need to do is to get the information on what databases we need to back up from the server. We do this with an easy shell command executed from within the script, and then parsing the output it gives us. Why do we not use a local mysql client implementation to query the database directly and get a mysql resultset? Simple. We do not want to expose the live mysql to the outside world. So what do is we issue the following command in a shell controlled by the script (so we can get the output into our code and use it):

ssh backup@your.liveserver mysqlshow

This will initiate a new ssh connection to your live server, and the command mysqlshow is sent over that connection, just as if you would have logged in and done it yourself. All output from mysql will be returned and will end up in your script.
Once you have gotten this down you will have to parse what mysql throws at you. This should be a no brainer. I use this regular expression /[\w]+/ because all we really need to do is remove all spaces and pipes. Of course, this regular expression must match all your databases you need to backup. Maybe you only have one database to back up, in which case, you do not need to do this step really.

Next thing in the script is to, for each database we want to backup, get the dump from the database. But how do we do this without actually writing everything to disk as dump files? Easy. This command may seem complex, but I will walk you trough it step by step:

ssh backup@your.liveserver “mysqldump databasename | gzip -c” > /your_backup_location/databasename_date.mysqldump.gz

ok. this may look scary for some of you. If you are not familiar with the mysqldump command, what it does is it outputs in plaintext a full recreation script for the database in question. So we initiate an ssh connection and tell the server to start outputting the whole database. Then we take all which it outputs and hand it over to gzip, which zips the output and with the -c parameter it just passes the zipped output on. All of this ends up on your side of the ssh connection, where you in turn tell it to go into the file /your_backup_location/databasename_date.mysqldump.gz
All data is zipped and done when it gets to your side of the connection.
Restoring from this file is also easy. The following command can be used for that:

cat /your_backup_location/databasename_date.mysqldump.gz | ssh backup@your.liveserver “gunzip -c | mysql databasename”

So, having the script done we need to put this into a crontab on the backup server. Note that nothing except for the authorized_keys have been changed on the live server.
crontab -e
lets you edit your crontab. Here you need to write a crontab line. The crontab may be empty, in which case it is hard to remember which order everything should be put, so here is a reference:

min     hour     day_of_month     month      day_of_week      command to be executed

So if we want this script to be executed every evening at 23:42 except for on Sundays, because then the server does other heavy load things, we would write something like this:

42      23         *           *            1-6        /path/to/script

A better reference for crontab can be found here: http://www.adminschoice.com/docs/crontab.htm#Crontab%20file

Tags: , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.