Simple Site Backup Pattern, uses S3 hadoop

The theory here is that you want to backup your website’s document root and the MySQL database on a daily basis. Storing the backup file on your webserver is OK in case you screw up your site, can revert easily, but it’s bad if you lose your server. Best is to have a copy on your web server for easy access, and also store it offsite in case of catastrophe.

 

In this tutorial, we’ll keep 7 days of backup in a local /backup/ directory, then store 30 days of backups on Amazon’s S3. In order to put the files onto Amazon’s S3, going to use hadoop! Using hadoop, not because I plan on doing Map/Reduce on my backups, but because it provides a simple command line method for putting files into S3! It’s easier than writing my own program to store on S3.

 

Note: In the past, I’ve written an article on storing backups on S3 using a Deduplication technique. This is pretty clever and will reduce the total disk space consumed on S3. But, it’s much more complex and if you lost your web server and needed access to the backup files, you’d need to reconstruct all the code to reassemble your files. This would be a pain, in a pinch. So, if you just want a super simple way to backup your files, and you can very easily retrieve them from any machine or browser, this is your article.

Read the rest of this entry »

Scalability Rules – Review

Just read the book, “Scalability Rules, 50 Principles for Scaling Web Sites” by Martin L. Abbott and Michael T. Fisher. I’d like to start out saying that having the opportunity to meet the authors of this book was an honor. I only wish that I had read the book before meeting them. I’m inspired by this book; the length of this blog post should be a testament to that.

 

The book was an easy read and spot on. You can read the whole book in one sitting, I did – one Sunday afternoon. Many times I marveled at how we’re all coming to the same realizations, at different companies. I’ve been living this through experience working at very fast growing Internet company with millions of customers, dozens of SaaS based services, and several data centers, some International. What I liked about this book was the affirmation of beliefs I share with those I work with. I could demonstrate example of nearly all of these rules across our array of online services. There were plenty of aha moments! This book is a great introduction to many (all the important ones?) advanced web application scalability topics. If you think you already know them all, think again. Give this book a read. If you’re already an advanced level web app architect, you’ll breeze over much of it, then get an eye-opening surprise or three.

 

I’d like to reinforce how much I enjoyed the affirmation of my own beliefs, and the eye openers. Never before have I seen all of these principles/rules/beliefs (whatever you want to call them) together in one easily reference-able book. I’m going to buy many copies of this and hand them out at work, with the instruction: We should all know these rules, inside and out, through our combined experiences, and this book sums them all up. This is a must have reference to have on the desk.

 

Scalability Rules is very modern, in that it discusses the very latest in large scale web application trends. These aren’t the principles from 2000 or 2005, this is culmination of all the latest, up to 2010 and 2011, trends. Seriously, back in 2005, this stuff hadn’t surfaced yet. Some of the horizontal scaling principles existed, but none of the more modern sharding, noSQL, page-cache, object-cache, CDN, and more had enough sustained experience for all of us to know if it’s all really worth the trouble. Very few sites in 2005 required much more than 2 or 3 web servers behind a load balancer and a database. I anticipated the growth that was about to happen, but it was hard to really know what it’s like until you live it.

 

Here are my brief comments on each of the rules:
Read the rest of this entry »

HAProxy for IPv6 translation to IPv4-only website

Background:
Have you heard of World IPv6 Day? On June 8 2011, a lot of very prominent web sites, like Google, Facebook, Yahoo and many more, are going to host their web site on dual stack for the day. They do this by publishing a AAAA DNS record, that’s an IPv6 address in DNS, so their site will resolve and be available on both IPv4 and IPv6 simultaneously. In other words, if you type in www.google.com on June 8 2011 and your computer can reach the IPv6 Internet, then your browser will fetch the AAAA record and connect to google’s site via IPv6, instead of IPv4. If you don’t have IPv6, you’ll just connect the same old way you do today. Either way, it’s going to be rather transparent to the end user, unless these sites flash something to users to say “HEY, YOU CONNECTED OVER IPv6″.

Challenge:
So, thinking about any web site out there that currently lives on IPv4, how can we make it dual stack, without owning or touching the existing servers? Answer: with a proxy. We want this proxy to be a separate machine, anywhere on the Internet, that already has dual stack hosting.

The dedicated, dual stacked proxy server will listen on an IPv6 IP address and forward that traffic to an IPv4 address. Can this be done reliably for a web site for World IPv6 Day. I think yes, it can. For one, the percentage of Internet traffic that’ll come over IPv6, even on this day, is only about 1% to 5%. So, as long as this proxy server can handle 5% of your normal load, it’ll work.

You can use HAProxy, available at http://haproxy.1wt.eu/, to turn your Linux or Solaris based dedicated (or virtual dedicated) server into an IPv6 translation proxy! And, it’ll work for both HTTP and HTTPS.

You don’t need to load the HTTPS ssl cert, either. HAProxy can TCP proxy, instead of HTTP proxy, so the end user will be talking directly to the server. The only caveot to this is that all traffic from your proxy will appear to the server as coming from the proxy ipv4 ip. You’ll lose all visibility of src ip.

Read on to see the proof of concept, this in action:
Read the rest of this entry »

HTTP Dynamic Streaming

I’d like to demo HTTP Dynamic Streaming for you….

Benefits:

  • * Adaptive bitrate - will seamlessly switch between streams depending on your flexing bandwidth conditions.
  • * Jump point navigation – can jump to any point in the movie instantly.
  • * Only download what you watch, only holds a small amount of buffer at a time. This saves on bandwidth charges.

I’m going to point it to this URL:

http://zeridemo-f.akamaihd.net/content/robinhood/robin_hood_25fps_3000-2.f4m

Read the rest of this entry »

swfobject force video when filename has missing extension

When a video won’t play and the video file doesn’t have valid extension, you may need to tell swfobject this is a video file.
provider=video

<html>
<head>
<title>Video Test</title>
<script type='text/javascript' src='swfobject.js'></script>
</head>
<body>

<!--div id="daVideo">
        <P>Alternate Text</P>
</div-->

<object classid='clsid:D27CDB6E-AE6D-11cf-96B8-444553540000' width='640' height='360' id='player1' name='player1'>
   <param name='movie' value='player.swf'>
   <param name='allowfullscreen' value='true'>
   <param name='allowscriptaccess' value='always'>
   <param name='flashvars' value='file=http://HOST/filename&image=http://HOST/imagename&provider=video&autostart=true'>
   <embed id='player1'
          name='player1'
          src='player.swf'
          width='640'
          height='360'
          allowscriptaccess='always'
          allowfullscreen='true'
          flashvars="file=http://HOST/filename&image=http://HOST/filename&provider=video&autostart=true"
   />
</object>

<!--script type="text/javascript" >
/*
// Baseline:
var poster = "http://HOST/filename";
var video = "http://HOST/filename";
var player = "http://HOST/video/player.swf";

if ((navigator.userAgent.indexOf('iPhone') != -1) || (navigator.userAgent.indexOf('iPod') != -1) || (navigator.userAgent.indexOf('iPad') != -1))
        document.getElementById("daVideo").innerHTML = '<video width="640" height="360" poster="'+poster+'" src="'+video+'" controls />';
else
{
  var so = new SWFObject(player,'mpl','640','360','9');
  so.addParam('allowfullscreen','true');
  so.addParam('allowscriptaccess','always');
  so.addParam('wmode','opaque');
  so.addVariable('file', video);
  so.addVariable('image', poster);
  so.addVariable('provider','video');
  so.write('daVideo');
}
*/
</script-->
</body>
</html>

Hadoop Streaming with PHP

I’ve started my journey with Hadoop, and the first thing I wanted to try was Streaming, so I could run the mapper and reducer methods with PHP programs.

The first thing I did was setup an alias:

alias stream='/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.18.3-streaming.jar'

Read the rest of this entry »

IPv6 Presentation, Introduction to IPv6

I am doing a presentation on IPv6, at my company’s TechFest.  This is a day event with keynote speakers, and break out sessions.  The purpose of TechFest is to give the developers and engineers a break from their day to day activity and get a view of what’s going on around the company and in the industry.

In this article, I’m copy/pasting my slide deck, and stripping out the company specific information, making this a generic Introduction to IPv6.

The Agenda for Today:

What is IPv6? (~10 minutes)
DNS (~10 minutes)
Getting Started (~10 minutes)
Web Application Development (~10 Minutes)

Read the rest of this entry »