Hive Configuration over Hadoop Platform

December 18, 2014

Hive Configuration over Hadoop Platform

The Apache Hive ™ data warehouse software facilitates querying and managing large data sets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

So, today we will look into the installation and configuration of Hive. And we will explore the advantages of SQL-like queries over the Hadoop platform.

Pre-requisites:
a. To install Hive, make sure you have the Hadoop instances are running on your clusters. If not, get it done first!!
b. Download hive from Hive downloads

Steps to configure Hive:
1. First, extract the hive-<version>.gz file.
2. Now, go to Hive directory:

cd path/to/hive/

3. Now run following commands one by one:

export HIVE_HOME={{pwd}}

export PATH=$HIVE_HOME/bin:$PATH

export HADOOP_HOME=/path/to/hadoop/

4.Now, create /tmp and /user/hive/warehouse directory at the HDFS location. For that, go to Hadoop directory:

cd path/to/hadoop/

5. And, run the following commands:

bin/hadoop fs -mkdir /tmp

bin/hadoop fs -mkdir /user/hive/warehouse

bin/hadoop fs -chmod g+w /tmp

bin/hadoop fs -chmod g+w /user/hive/warehouse

6. Now, set the Hive home:

export HIVE_HOME=/path/to/hive

Congratulations, you are done with the configuration..!!

To start hive go to Hive home:

cd /path/to/hive

And run the command:

bin/hive

Search This Blog

Solutions For All

Hive Configuration over Hadoop Platform

Comments

Post a Comment

Popular Posts

TNSPING is not recognised as an internal or external command

Oracle Temp Tablespace Becomes Too Large – Resize the Temp Tablespace