Simple Site Backup Pattern, uses S3 hadoop

The theory here is that you want to backup your website’s document root and the MySQL database on a daily basis. Storing the backup file on your webserver is OK in case you screw up your site, can revert easily, but it’s bad if you lose your server. Best is to have a copy on your web server for easy access, and also store it offsite in case of catastrophe.

 

In this tutorial, we’ll keep 7 days of backup in a local /backup/ directory, then store 30 days of backups on Amazon’s S3. In order to put the files onto Amazon’s S3, going to use hadoop! Using hadoop, not because I plan on doing Map/Reduce on my backups, but because it provides a simple command line method for putting files into S3! It’s easier than writing my own program to store on S3.

 

Note: In the past, I’ve written an article on storing backups on S3 using a Deduplication technique. This is pretty clever and will reduce the total disk space consumed on S3. But, it’s much more complex and if you lost your web server and needed access to the backup files, you’d need to reconstruct all the code to reassemble your files. This would be a pain, in a pinch. So, if you just want a super simple way to backup your files, and you can very easily retrieve them from any machine or browser, this is your article.

 

Note: pre-req, you need to install hadoop on your server. Should be as easy as “yum install hadoop” on a RHEL or CENTOS machine. At the time of writing, I have hadoop-0.20-0.20.2+923.21-1 installed.

 

Make the directories

mkdir -p /scripts
mkdir -p /backup

 

/scritps/backup.sh (This makes a backup of your doc root and mysql db in /backup/ and copies to S3)

#!/bin/sh

echo mkdir -p /backup/
mkdir -p /backup/
DOMAIN=koopman.me
DOCUMENT_ROOT=/var/www/html
DATABASE_HOST=your_db_host
DATABASE_NAME=your_db_name
DATABASE_USER=your_db_user
DATABASE_PASS=your_db_pass
DATE=`date +"%Y-%m-%d_%H_%M_%S"`

echo "mysqldump -h $DATABASE_HOST -u $DATABASE_USER -p --opt $DATABASE_NAME | gzip > /backup/$DATE_$DATABASE_NAME.sql.gz"
mysqldump -h $DATABASE_HOST -u $DATABASE_USER -p$DATABASE_PASS --opt $DATABASE_NAME | gzip > /backup/$DATE_$DATABASE_NAME.sql.gz
echo cd $DOCUMENT_ROOT
cd $DOCUMENT_ROOT
echo tar -czf /backup/$DATE.$DOMAIN.tar.gz .
tar -czf /backup/$DATE.$DOMAIN.tar.gz .

#cleanup older than 30 days:
echo "find /backup -type f -mtime +30 | xargs rm -f"
find /backup -type f -mtime +30 | xargs rm -f

# Send files to S3:
echo "hadoop fs -put /backup/* s3n://AWS_ID:AWS_KEY@BUCKET/backup/"
hadoop fs -put /backup/* s3n://your_aws_id:your_aws_secret@your_aws_bucket/backup/

 

/scripts/purge_hadoop.sh (This deletes backups older than 30 days from your_aws_bucket/backup)

#!/bin/sh

echo "hadoop fs -ls s3n://AWS_ID:AWS_KEY@your_aws_bucket/backup/* | sed -re "s/ +/\t/g" > /tmp/d.tmp"
hadoop fs -ls s3n://your_aws_key:your_aws_secret@your_aws_bucket/backup/* | sed -re "s/ +/\t/g" > /tmp/d.tmp

export IFS="
"
for i in `cat /tmp/d.tmp`; do
        DATE=`echo $i | cut -f 4`
        FILE=`echo $i | cut -f 6`
        DELDATE=`echo '<?php print date("Y-m-d", time()-60*60*24*30)."\n"; ?>' | php`
        echo "DATE=$DATE, FILE=$FILE, DELDATE=$DELDATE"
        if [[ "$DATE" < "$DELDATE" ]]; then
                echo hadoop fs -rm s3n://AWS_ID:AWS_KEY@your_aws_bucket$FILE
                hadoop fs -rm s3n://your_aws_key:your_aws_secret@your_aws_bucket$FILE
        fi
done

 

Make sure to chmod 700 your /scripts/* files:

chmod 700 /scripts/backup.sh
chmod 700 /scripts/purge_hadoop.sh

 

Next, make them run daily:

 

/etc/cron.d/backup (This runs at 01:01 daily)

1 1 * * * root /scripts/backup.sh > /var/log/backup.log 2>&1

 

/etc/cron.d/purge_hadoop (This runs at 00:05 daily)

5 0 * * * root /scripts/purge_hadoop.sh > /var/log/purge_hadoop.log 2>&1

 

And that’s all. This is about as simple of a pattern for a daily backup strategy as it gets.

 

Dave

2 Responses to “Simple Site Backup Pattern, uses S3 hadoop”

  1. Percial Rich says:

    Yeah, I got some good informative blog to be. I am having fond reading for it and even love to bookmark your site so that I could updates your latest blog. Thanks!

  2. brilliant idea! The source codes that you shared are very useful and helpful to all the readers here. Thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

* Copy this password:

* Type or paste password here:

9,978 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Spam Protection by WP-SpamFree

Performance Optimization WordPress Plugins by W3 EDGE