Connect to Apache Hive over ODBC

Mrudula Madiraju
4 min readNov 20, 2020
ImageSource: Unsplash

Introduction

Many Business Intelligence(BI) and Reporting tools such as MicroStrategy, Tableau, SPSS require ODBC connectivity to databases for running analytical SQL queries. This blog is a walkthrough of how you can connect to IBM Analytics Engine’s Hive endpoint using any of the standard Hive ODBC drivers. Analytics Engine is built on top of Hortonworks Data Platform 3.1.5. (Apache Hive is at version 3.1.0)
The configuration samples include connecting from the following clients:
a) Linux system CLI
b) From SPSS Modeler
c) From MicrosoftExcel

ODBC Drivers — Overview

Some of the enterprise data analytics platforms come with inbuilt drivers and connectors that can connect to all of the standard data sources. For example, you can see that MicroStrategy comes with Apache Hive ODBC drivers that can connect to Hortonworks HDP amongst a host of other platforms such as EMR, CDH, MapR etc. So also for Tableau — you will find a wide choice of supported data sources. For other tools such as SPSS or say a Microsoft Excel- you may need to download an ODBC driver upfront and install it on the client side.

As mentioned in the Apache Hive docs, the ODBC driver is no longer part of the Apache Hive for HiveServer2. However, you do find plenty of Hive ODBC drivers that you can download for use — most require registration and some offer a trial period before you go for the paid version. So you can pick a driver according to your requirements. Instructions for installing the ODBC driver is specific to the driver that you select and comes with detailed documentation. Driver installation instructions are beyond the scope of this article.

Configuring the driver typically requires making entries in the odbc.ini or the client side UI where you can define the parameter values.

Configuring the ODBC driver properties :- Linux

Here is a sample odbc.ini file used to configure from a Linux system

#Path to where your driver library has been installed
Driver=/usr/lib/hive/lib/native/Linux-amd64-64/xxxhiveodbc64.so
#Fully Qualified hostname of your AE cluster. Change it for your cluster
Host=<chs-mmm-007-mn001.us-south.ae.appdomain.cloud>
#All endpoints in Analytics Engine are routed through Knox
Port=8443
#HTTPPath - the endpoint for AE is through Knox gateway
HTTPPath=gateway/default/hive
#Database or Schema of your choice
Database/Schema=<default>
#The Thrift Transport mode. Some drivers require you to give a number here
ThriftTransportMode=<HTTP/2>
#Specifies whether to connect directly to HiveServer2 or using Zookeeper. Enter the value for direct mode ie. no zookeeper
ServiceDiscoveryMode=<None/0>
#Analytics Engine uses HiveServer2
HiveServerType=2
#Authentication Mechanism
AuthMech=<User/Password>
#Uses SSL?
SSL=<Yes/1>
#For Analytics Engine, other parameters such as Kerberos ClientCert, TwoWaySSL etc are not applicable. The DSN also can be specified here and later used from isql as shown below

Query table using the isql command by passing the DSN, user and password

Configuring the ODBC properties :- SPSS Modeler

The parameters are exactly the same as given in the odbc.ini example. The screen captures are self explanatory.

After configuring, we can fetch data from the table as shown here.

Configuring the ODBC properties :- Microsoft Excel

At this point you need to add the DSN and other details

You should have installed the driver by this step and it will automatically show up in the prompt.

And voila! you are ready to execute the queries.

Conclusion

This article is a quick how-to to show that Apache Hive tables can be accessed for SQL reads. Do explore with further functional tests and benchmarking to make sure that ODBC is a right fit for your project requirements.

--

--

Mrudula Madiraju

Dealing with Data, Cloud, Compliance and sharing tit bits of epiphanies along the way.