HIVE

Have you ever thought, how the honey you get from the market is actually produced?

A group of bees works really hard to collect honey from the flowers and stores it in the hive. And since we humans are not capable of making honey. We collect the honey from their hive.

Even here Hive is something of this kind. If you love to write SQL commands and are not so familier with java, Hive comes into rescue. Just think the way humans collects honey from a beehive without knowing the way to extract it from flowers, same way a sql developer can access Hadoop with sql commands through hive without knowing java programming. You write sql commands in Hive and it internally gets converted to java code.

HIVE provides Hive Query Language which is similar to SQL. So it gives us a flavour of RDBMS. It is as simple as writing a SQL query in an SQL terminal. However there are some advantages and disadvantages of HIVE listed below:

ADVANTAGES OF HIVE OVER RDBMS

HIVE is capable of processing large dataset.
HIVE uses parallel processing where data is distributed in several stand alone systems.
HIVE provides a vast range of in built functions which is vast when compared to the in built functions of SQL.

LIMITATIONS OF HIVE OVER RDBMS

Hive Query Language only supports Equi joins.
Accessing a single record takes long time as data is distributed on multiple systems and the records are not indexed.
HIVE only supports read operations as the actual data is stored in HDFS and HIVE reads it from there.
HIVE does not support implicit joins or natural joins.
UPDATE operation is not supported in HIVE.

HOW DOES HIVE MANAGES THE DATA

Just think for a moment. How is data stored in Hadoop?

Data in Hadoop is stored in files and is distributed across multiple machines in HDFS(Hadoop Distributed file System). But when you trigger a query in HIVE terminal, you get the data in the form of tables. Since HIVE is getting the data from HDFS, someone is working behind the scenes to show the files in the form of tables. It is nothing but the metastore in HIVE.

Metastore in HIVE

In simple words metastore is a relational database. HIVE takes the help of relational database to execute the queries. The default database used by the metastore is 'Derby'.

The Metastore holds the metadata for the tables in Hive. It also knows how a table is defined (the column names and it's types) and also has information about the table schema for HIVE.

Most importantly the mapping of directories and files to tables is done by the Metastore.

HIVE Architecture

Just like the SQL terminal, HIVE also has a HIVE terminal. When you type a query in the HIVE terminal, the query is passed to the driver. The driver is actually a JDBC/ODBC driver. The driver then passes the query to the compiler. The compiler further communicates with the metastore to get the necessary metadata, checks the syntax of the query, generates the execution plan and sends it back to the driver.

The driver sends the execution plan to the execution engine which is further submitted to HDFS (Hadoop Distributed file System) to let it run on multiple machines.

NEXT ❯❯