I’ve started my journey with Hadoop, and the first thing I wanted to try was Streaming, so I could run the mapper and reducer methods with PHP programs.
The first thing I did was setup an alias:
alias stream='/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.18.3-streaming.jar'
The next thing was to create a scripts dir in my $HADOOP_HOME (/usr/local/hadoop) dir.
wc_mapper.php
#!/usr/bin/php
<?php
error_reporting(0);
$in = fopen("php://stdin", "r");
$results = array();
while ( $line = fgets($in, 4096) )
{
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach ($words as $word)
$results[$word] += 1;
}
fclose($in);
foreach ($results as $key => $value)
print "$key\t$value\n";
?>
wc_reducer.php
#!/usr/bin/php
<?php
error_reporting(0);
$in = fopen("php://stdin", "r");
$results = array();
while ( $line = fgets($in, 4096) )
{
list($key, $value) = preg_split("/\t/", trim($line), 2);
$results[$key] += $value;
}
fclose($in);
ksort($results);
foreach ($results as $key => $value)
print "$key\t$value\n";
?>
To execute:
stream -input conf -output output4 -mapper /usr/local/hadoop/scripts/wc_mapper.php -reducer /usr/local/hadoop/scripts/wc_reducer.php
I’ll come back later and document. Just wanted to get the initial recorded.





















