Monthly Archives: April 2009

Parallel Distributed Computing Example

You may have seen article, Hadoop Example, AccessLogCountByHourOfDay. This is a distributed computing solution, using Hadoop. The purpose of this article is to dive into the theory behind this. To understand the power of distributed computing, we need to step … Continue reading

Posted in Code, Research | Tagged , | 1 Comment

Hadoop Example, AccessLogCountByHourOfDay

Inspired by an article written by Tom White, AWS author and developer: “Running Hadoop MapReduce on Amazon EC2 and Amazon S3” Instead of minute of the week, this one does by Hour Of The Day. I just find this more … Continue reading

Posted in Code | Tagged , | Leave a comment

Inspiring MapReduce lectures by Google

Watched a set of 3 lectures run at Google, by Aaron Kimball, on MapReduce was inspiring to me. I feel like I have a much more solid grasp on MapReduce after watching these. I really liked how it started out … Continue reading

Posted in Research | Tagged , | Leave a comment

hadoop-0.18.3 Could not create the Java virtual machine

Installed hadoop on a VM, and needed to set the java heap size, -Xmx1000m, lower than the default 1000 to get it to work.  I set the HADOOP_HEAPSIZE var in the conf/hadoop-env.sh dir to the lower value, but hadoop continued … Continue reading

Posted in Code, Linux | Tagged | 2 Comments

Fuse mounting HDFS on CentOS 5

The first step is to get fuse installed.  It’s not as simple as “yum install fuse” – it doesn’t ship with RHEL5/CentOS5. wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt rpm –import RPM-GPG-KEY.dag.txt rm RPM-GPG-KEY.dag.txt yum install yum-priorities wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm rpm -Uhv rpmforge-release-0.3.6-1.el5.rf.i386.rpm rm rpmforge-release-0.3.6-1.el5.rf.i386.rpm vim … Continue reading

Posted in Linux | Tagged , , | Leave a comment

Holy Smokes, Hadoop works with S3 directly!

bin/hadoop fs -put /path/to/source s3://<s3id>:<s3secret>@<bucket>/path/to/destination This is so cool. I’m guessing that I could also use S3 as my input or output directory for Map/Reduce jobs. Share This Post On:u=http://www.koopman.me/2009/04/holy-smokes-hadoop-works-with-s3-directly/&title=Holy+Smokes%2C+Hadoop+works+with+S3+directly%21″ title=”Share ‘Holy Smokes, Hadoop works with S3 directly!’ on BlueDot”>

Posted in Linux | Leave a comment

Hadoop Streaming with PHP

I’ve started my journey with Hadoop, and the first thing I wanted to try was Streaming, so I could run the mapper and reducer methods with PHP programs. The first thing I did was setup an alias: alias stream=’/usr/local/hadoop/bin/hadoop jar … Continue reading

Posted in Uncategorized | Leave a comment

IPv6 Presentation, Introduction to IPv6

I am doing a presentation on IPv6, at my company’s TechFest.  This is a day event with keynote speakers, and break out sessions.  The purpose of TechFest is to give the developers and engineers a break from their day to … Continue reading

Posted in Uncategorized | Leave a comment