Introduction to Hadoop Part 4
In the previous articles, we have discussed in detail about Hadoop core stack component (HDFS, YARN, MapReduce) and data manipulation stack components (Apache Pig, Apache HBase, Apache Hive, Apache Cassandra, Apache Spark, Apache Storm, Apache Sqoop etc.). In this article, we will explore third stack of Hadoop echo system that is coordinating stack.
So, let’s begin…
Hadoop core stack tools coordinates between the various services in the Hadoop ecosystem. It coordinates with the core stack and data manipulation stack components in a distributed environment. These tools save a lot of time by performing various services such as synchronization, configuration maintenance, grouping etc. Popular coordination stack tools are:
· Apache ZooKeeper
· Apache Oozie
· Apache Atlas
It is open-source service used for highly reliable distributed coordination of cloud applications in distributed environment. ZooKeeper is a coordination service which is used by Hadoop to manage and coordinate clusters in distributed environment. It provides mechanisms to share data with no inconsistencies applying different synchronization mechanisms. It offers various services i.e. naming of nodes, synchronization, locking and configuration management etc. An alternative for zookeeper is Apache Ambari. For more information, you can visit official webpage of Apache ZooKeeper. https://zookeeper.apache.org/
It is a scheduling system that is used to manage Hadoop jobs. There are number of technologies that can operate top of Hadoop for different purpose. Any real-time big data application may use Oozie to schedule the jobs in Hadoop environment. Oozie is capable to integrate different jobs in the Hadoop including Java MapReduce, Pig, Hive, Sqoop, etc. For more details you can refer Apache Oozie official webpage: https://oozie.apache.org/
Due to stringent varied rules across the globe and arising legalities to usage of data, big data governance has become a critical concern for the real-world application. It is a platform that permits companies using Hadoop to ensure their data obeys to governance compliance policies. It provides mechanisms to manage metadata, classify data into various categories i.e., personally identifiable information (PII) and sensitive data. For more details you can visit official webpage of Apache Atlas. https://atlas.apache.org/
In this article, I explored coordinating stack of Hadoop echo system and briefly discussed Apache ZooKeeper, Oozie and Atlas. These technologies are being used in Big Data and distributed computing environment. I sure this will provide a basic understanding on Hadoop echo-system. If you are interested in dive deeper into big data technologies and Hadoop echo system, then this might be the starting point for you.
On wrapping up notes, feel free to share your comments. Your likes and comments will help me to present contents in better way. See you next week