Install Hadoop on AWS Ubuntu Instance

October 15, 2015

October 15, 2015

Step 1: Create an Ubuntu 14.04 LTS instance on AWS.

Step 2: Connect to the instance.

chmod 400 yourKey.pem
ssh -i yourKey.pem ubuntu@your_instance_ip

Step 3: Install Java.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java6-installer
sudo update-java-alternatives -s java-6-oracle
sudo apt-get install oracle-java6-set-default

Step 4: Add a Hadoop user.

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser

Step 5: Create an SSH key for password-free login.

su - hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 6: Test the connection.

ssh localhost
exit

Step 7: Download and Install Hadoop.

cd /usr/local
sudo wget [http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz](http://apache.01link.hk/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz)
sudo tar -xzvf hadoop-1.2.1.tar.gz
sudo mv hadoop-1.2.1 hadoop
chown -R hduser:hadoop hadoop
sudo rm hadoop-1.2.1.tar.gz

Step 8: Update .bashrc.

su - hduser
vim $HOME/.bashrc

# Add the following content to the end of the file:
export HADOOP_PREFIX=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-sun
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
export PATH=$PATH:$HADOOP_PREFIX/bin

Then save it with :wq and execute .bashrc.

source ~/.bashrc

Step 9: Configure Hadoop, logged in as hduser.

cd /usr/local/hadoop/conf
vim hadoop-env.sh

# Add the following lines to the file:
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
export HADOOP_CLASSPATH=/usr/local/hadoop

Save and exit with :wq.

Step 10: Create a temporary directory for Hadoop.

exit
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp

Step 11: Add configuration snippets.

su - hduser
cd /usr/local/hadoop/conf
vim core-site.xml

# Put the following content between <configuration> ... </configuration> tags:

Include your Hadoop configuration here.

# Save and exit with :wq

Continue with configuring your additional files as needed.

Step 12: Format the HDFS.

/usr/local/hadoop/bin/hadoop namenode -format

Step 13: Start Hadoop.

/usr/local/hadoop/bin/start-all.sh

Step 14: To check if all processes are up and running.

jps

Step 15: To stop Hadoop, type the following command:

/usr/local/hadoop/bin/stop-all.sh

Step 16: To start Hadoop again.

/usr/local/hadoop/bin/start-all.sh

You are now ready to rock! Have fun :)


Profile picture

Software development professional with expertise in application architecture, cloud solutions deployment, and financial products development. Possess a Master's degree in Computer Science and an MBA in Finance. Highly skilled in AWS (Certified Solutions Architect, Developer and SysOps Administrator), GCP (Professional Cloud Architect), Microsoft Azure, Kubernetes(CKA, CKAD, CKS, KCNA), and Scrum(PSM, PSPO) methodologies. Happy to connect