Learn how to work with the Hive interactive shell.
Learn how to create tables in Hive.
Learn how to load data into Hive tables.
Learn how to run basic Hive queries.
This section shows the basic usage of Hadoop Hive. Hive uses a SQL-like language called HiveQL, and runs on top of Hadoop. Instead of writing raw MapReduce programs, Hive allows you to perform data warehouse tasks using a simple and familiar query language. After completing this section, you will be able to use HiveQL to query big data.
In the sample code below we will continue to use the same event tuple patient data. Let's start the Hive CLI interactive shell first by typing hive in the command line.
Before loading data, we first need to define a table just like we would if we were working with a database server such as SQL.
And you can check existing tables and schema with the commands SHOW TABLES; and DESCRIBE table_name; respectively.
Next we'll insert data into the table.
With the data loaded you can run familiar SQL statements like:
You can also save query results to a local directory (in the local file system):
Besides running commands with the interactive shell, you can also run a script in batch mode automatically. For example, in the sample/hive folder, you can run the entire sample.hql script with the command:
This script simply contains all of the commands that we ran in the shell, with one additional statement to drop the existing table if necessary:
Furthermore, it's also possible to run Hive as a server and connect to the server with JDBC or with its Beeline client.
Hive translates queries into a series of MapReduce jobs. Therefore, it is not suitable for real-time use cases. Alternative tools inspired and influenced by Hive are getting more attention lately, for example, Cloudera Impala and Spark SQL.