What is Hadoop?
As i mentioned in a previous post Hadoop is a big data computing framework
What is a Raspberry Pi?
RPi is a small, inexpensive (34 euros) ARM based computer (yes, a computer not a just a board !). It is meant primarily as an educational tool.
Is Hadoop on the Pi for production ?
Compute performance is very poor.
The main aim for this experience is to learn how to set up a Hadoop cluster using the cheapest method witch happens to be the RPIs.
The same methode can be used to other non-RPI based clusters.
Could I use this guide to setup a non-Raspberry Pi based (Real) Hadoop Cluster?
4x Raspberry Pis
1x 8 port 10/100 Switch
1x Powered 4 port USB hub
4x 2′ Cat 5 cables
4x 2′ USB “power” cables
4x Class 10 8GB SDHC Cards
for powering up the Pis i used the iPad power plug since it offers 220V to 5V-2.1 A, each PI consume up to 500mA, 4*500 = 2A.
Total cost: about 230 euros.
If you are looking for more performance you can buy one of these *_* !
Before starting some useful links i used to write this guide:
Running hadoop on ubuntu linux (multi node cluster)
Running hadoop on ubuntu linux (single node cluster)
Raspberry Pi Preparation
We’ll do the master node by itself first so that this guide can be used for a single or multi-node setup.
RPI OS install and configuration
- Download the “Raspbian wheezy” image from here and follow these directions to set up your first SD card. Soft-float isn’t necessary.
- Hook up a monitor and keyboard and start up the device by connecting up the USB power.
- When the Raspberry Pi config tool launches, change the keyboard layout first. If you set passwords, etc. with the wrong KB layout you may have a hard time with that later.
- Change the timezone, then the locale. (Language, etc.)
- Change the default user password
- Configure the Pi to disable the X environment on boot. (boot_behavior->straight to desktop->no)
- Enable the SSH Daemon (Advanced-> A4 SSH) then exit the raspi-config tool
- Reboot the Pi. (sudo reboot)
Before advancing with the PI you have to Split the Memory so the rest of the none used space can be used as a hard-drive.
Each node must have a unique name and static IP. you can use the Debian instructions since the Rasbian build is based on it.
Change the machine host name first:
sudo nano /etc/hostname