Introduction to Hadoop Part 4

Hey guys,

In the previous articles, we have discussed in detail about Hadoop core stack component (HDFS, YARN, MapReduce) and data manipulation stack components (Apache Pig, Apache HBase, Apache Hive, Apache Cassandra, Apache Spark, Apache Storm, Apache Sqoop etc.). In this article, we will explore third stack of Hadoop echo system that is coordinating stack.

So, let’s begin…

Hadoop core stack tools coordinates between the various services in the Hadoop ecosystem. It coordinates with the core stack and data manipulation stack components in a distributed environment. These tools save a lot of time by performing various services such as synchronization, configuration maintenance, grouping etc. Popular coordination stack tools are:

· Apache ZooKeeper

· Apache Oozie

· Apache Atlas

Apache ZooKeeper

It is open-source service used for highly reliable distributed coordination of cloud applications in distributed environment. ZooKeeper is a coordination service which is used by Hadoop to manage and coordinate clusters in distributed environment. It provides mechanisms to share data with no inconsistencies applying different synchronization mechanisms. It offers various services i.e. naming of nodes, synchronization, locking and configuration management etc. An alternative for zookeeper is Apache Ambari. For more information, you can visit official webpage of Apache ZooKeeper. https://zookeeper.apache.org/

Apache Oozie:

It is a scheduling system that is used to manage Hadoop jobs. There are number of technologies that can operate top of Hadoop for different purpose. Any real-time big data application may use Oozie to schedule the jobs in Hadoop environment. Oozie is capable to integrate different jobs in the Hadoop including Java MapReduce, Pig, Hive, Sqoop, etc. For more details you can refer Apache Oozie official webpage: https://oozie.apache.org/

Apache Atlas

Due to stringent varied rules across the globe and arising legalities to usage of data, big data governance has become a critical concern for the real-world application. It is a platform that permits companies using Hadoop to ensure their data obeys to governance compliance policies. It provides mechanisms to manage metadata, classify data into various categories i.e., personally identifiable information (PII) and sensitive data. For more details you can visit official webpage of Apache Atlas. https://atlas.apache.org/

Conclusion

In this article, I explored coordinating stack of Hadoop echo system and briefly discussed Apache ZooKeeper, Oozie and Atlas. These technologies are being used in Big Data and distributed computing environment. I sure this will provide a basic understanding on Hadoop echo-system. If you are interested in dive deeper into big data technologies and Hadoop echo system, then this might be the starting point for you.

On wrapping up notes, feel free to share your comments. Your likes and comments will help me to present contents in better way. See you next week

--

--

--

Professor (Big Data Analytics)||Adani Institute of Digital Technology Management (AIDTM) || Adani Group || Gandhinagar, Gujarat, India

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Mockito : Argument Captor — catches reference and not value

Stop Thinking Start Programming

ZeniMax sues Facebook for Copyright Infringement

Transport Layer Multiplexing and Demultiplexing

1.1 Introduction to Computing in Google cloud

Why Ionic Is My New Favorite Tool As A Web Developer

MongoDB’s Aggregation Framework — Analytics Reports Use-Case

AI-powered Zettelkasten

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Virendra Kumar Shrivastava

Virendra Kumar Shrivastava

Professor (Big Data Analytics)||Adani Institute of Digital Technology Management (AIDTM) || Adani Group || Gandhinagar, Gujarat, India

More from Medium

Introduction to Hadoop Part 3

Introduction to Big Data and Hadoop

Getting Started with Data Selection

Why we need Structured Data | Classification of Big Data