Hive Configuration over Hadoop Platform


The Apache Hive ™ data warehouse software facilitates querying and managing large data sets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
So, today we will look into the installation and configuration of Hive. And we will explore the advantages of SQL-like queries over the Hadoop platform.
Pre-requisites:
a. To install Hive, make sure you have the Hadoop instances are running on your clusters. If not, get it done first!!
b. Download hive from Hive downloads
Steps to configure Hive:
1. First, extract the hive-<version>.gz file.
2. Now, go to Hive directory:
1
cd path/to/hive/
3. Now run following commands one by one:
1
2
3
export HIVE_HOME={{pwd}}
export PATH=$HIVE_HOME/bin:$PATH
export HADOOP_HOME=/path/to/hadoop/
4.Now, create /tmp and /user/hive/warehouse directory at the HDFS location. For that, go to Hadoop directory:
1
cd path/to/hadoop/
5. And, run the following commands:
1
2
3
4
bin/hadoop fs -mkdir /tmp
bin/hadoop fs -mkdir /user/hive/warehouse
bin/hadoop fs -chmod g+w /tmp
bin/hadoop fs -chmod g+w /user/hive/warehouse
6. Now, set the Hive home:
1
export HIVE_HOME=/path/to/hive

Congratulations, you are done with the configuration..!!
To start hive go to Hive home:
1
cd /path/to/hive
And run the command:
1
bin/hive

Comments