Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which reside in a distributed storage system. Hive is built on top of Hadoop and developed by Facebook.
Hadoop (Read in detail) includes the Hadoop Distributed File System (HDFS) and MapReduce. It is not possible for storing a large amount of data on a single node, therefore Hadoop uses a new file system called HDFS which splits the data into many smaller parts and distribute each part redundantly across multiple nodes. MapReduce is a software framework for the analysis and transformation of very large data sets. Hadoop uses MapReduce function for distributed computation.
Hive manages data stored in HDFS and provides a query language based on SQL for querying the data. Hive looks very much like a traditional database code with SQL access. Hive provides a way to query the data using a SQL-like query language called HiveQL (Hive Query Language). Internally, a compiler translates HiveQL statements into MapReduce jobs, which are then submitted to the Hadoop framework for execution.
Hive is based on Hadoop and MapReduce operations, there are several key differences between HiveQL and SQL. The first is that Hadoop is intended for long sequential scans, and because Hive is based on Hadoop, you can expect queries to have very high latency (many minutes). This means that Hive would not be appropriate for applications that need very fast response times, as you would expect with a database such as DB2. Finally, the Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.
Suggested Read:
3 Things you Didn’t know Big Data could Do
Apache Hadoop – Introduction and architecture
Step 1: Download and extract Hive from official site
Step 2:
# gedit ~/.bashrc # Set HIVE_HOME export HIVE_INSTALL=/home/soft/hive export PATH=$PATH:$HIVE_INSTALL/bin # HIVE conf ends
Reload bashrc file as: source ~/.bashrc
Step 3:
export HADOOP_HOME=/home/soft/hadoop
Step 4:
hadoop dfs –mkdir –p /user/hive/warehouse
sudo chmod -R 775 /user/hive/warehouse
Step 5:
# javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/home/soft/hive/metastore_db;create=true
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
and add below lines at the start of hive-site.xml
# system:java.io.tmpdir /tmp/hive/java system:user.name ${user.name}
Step 6:
Enjoy querying to Hive. Let me know if you face any issues.
The configuration of resources can be a time-consuming and difficult operation while creating a React…
Rapid technological growth and developments have provided vast areas of…
How often have you thought about changing the way that you store and use data?…
Programming Languages are a set of rules that aid in transforming a concept into a…
Serverless edge computing is a new technology with a lot of promise, but it can…
Do any of your passwords include personal names, date of birth, or pet names? If…
Leave a Comment