AWS in Plain English

New AWS, Cloud, and DevOps content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

How to install Apache Pig on a Hadoop cluster on Ubuntu

--

Hey guys,

In this article, I will briefly discuss what is Apache Pig and explain how to install Apache Pig in on a Hadoop cluster. I will provide step-by-step installation instructions to set up Pig on Ubuntu 16.04. So, if you have the curiosity to know about Pig and its installation process, this article can be a steppingstone for your Apache Pig journey. The agenda for discussion is as follows:

What is Apache Pig?

Step-by-step Apache Pig installation

So, let us begin…

What is Apache Pig?

Pig is a scripting platform that is designed to process and analyze large datasets which were developed in late 2006 by Yahoo researchers. Later, Pig became an Apache open-source project. Pig enables users/analysts to express complicated transformations in a quite simple manner. Pig is a scripting tool that runs on Hadoop clusters. Apache Pig interacts directly with data stored in the Hadoop cluster. Besides Java, Pig is another language in which MapReduce programs can be written. Apache Pig converts Pig scripts into MapReduce tasks, which may then be used with Hadoop YARN (Yet Another Resource Negotiator) to retrieve HDFS (Hadoop Distributed File System) datasets. Pig has two components:

1. Pig Latin script language

2. A runtime engine

To use Apache Pig, first set up a Hadoop cluster and then install Pig on the Hadoop cluster.

Step-by-step installation of Apache Pig on Hadoop cluster on Ubuntu

Pre-requisite:

· Ubuntu 16.04 or higher version running (I have installed Ubuntu on Oracle VM (Virtual Machine) VirtualBox),

· Run Hadoop on ubuntu (I have installed Hadoop 3.2.1 on Ubuntu 16.04). You may refer to my blog “How to install Hadoop installation” click here for Hadoop installation).

Pig installation steps

Step 1: Login into Ubuntu

Step 2: Go to https://pig.apache.org/releases.html and copy the path of the latest version of pig that you want to install. Run the following comment to download Apache Pig in Ubuntu:

$ wget https://dlcdn.apache.org/pig/pig-0.16.0/pig-0.16.0.tar.gz

Step 3: To untar pig-0.16.0.tar.gz file run the following command:

$ tar xvzf pig-0.16.0.tar.gz

Step 4: To create a pig folder and move pig-0.16.0 to the pig folder, execute the following command:

$ sudo mv /home/hdoop/pig-0.16.0 /home/hdoop/pig

Step 5: Now open the .bashrc file to edit the path and variables/settings for pig. Run the following command:

$ sudo nano .bashrc

Add the below given to .bashrc file at the end and save the file.

#PIG settingsexport PIG_HOME=/home/hdoop/pigexport PATH=$PATH:$PIG_HOME/binexport PIG_CLASSPATH=$PIG_HOME/conf:$HADOOP_INSTALL/etc/hadoop/export PIG_CONF_DIR=$PIG_HOME/confexport JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64export PIG_CLASSPATH=$PIG_CONF_DIR:$PATH#PIG setting ends

Step 6: Run the following command to make the changes effective in the .bashrc file:

$ source .bashrc

Step 7: To start all Hadoop daemons, navigate to the hadoop-3.2.1/sbin folder and run the following commands:

$ ./start-dfs.sh$ ./start-yarn$ jps

Step 8: Now you can launch pig by executing the following command:

$ pig

Step 9: Now you are in pig and can perform your desired tasks on pig. You can come out of the pig by the quit command:

> quit;

Conclusion

In this article, I have introduced what is Apache Pig and demonstrated how to install Pig on the Hadoop cluster in Ubuntu. So, friends, it is super simple to install Pig. Now, you can explore and perform desired operations on it.

On wrapping up notes, feel free to share your comments. Your claps and comments will help me to present content in a better way. See you next week.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.

--

--

Published in AWS in Plain English

New AWS, Cloud, and DevOps content every day. Follow to join our 3.5M+ monthly readers.

Written by Dr. Virendra Kumar Shrivastava

Professor || Alliance College of Engineering and Design || Alliance University || Writer || Big Data Analytics

No responses yet