How to Provision IBM Cloud Analytics Engine instance
Terraform is an open source software from Hashicorp that enables you to easily provision and manage resources of your cloud provider using a simple declarative json like configuration files to define your requirements.
In general, using Terraform simplifies the provisioning aspects especially when you have to deal with multi clouds and multiple environments. Almost all cloud and other infra providers have the predefined templates and plugins that makes it easy to use and navigate — so that you, as a devops engineer, don’t have to get into the details of the API…
Many Business Intelligence(BI) and Reporting tools such as MicroStrategy, Tableau, SPSS require ODBC connectivity to databases for running analytical SQL queries. This blog is a walkthrough of how you can connect to IBM Analytics Engine’s Hive endpoint using any of the standard Hive ODBC drivers. Analytics Engine is built on top of Hortonworks Data Platform 3.1.5. (Apache Hive is at version 3.1.0)
The configuration samples include connecting from the following clients:
a) Linux system CLI
b) From SPSS Modeler
c) From MicrosoftExcel
One question that’s often asked is — “How can I modify or delete data that is on S3 or IBM Cloud Object Storage?” The answer is surprisingly simple. You can do that with the following caveats:
- It works only with Hive Transactional tables
- It is supported only for Hive ORC format
- It is supported only from Hive. That is — it is not supported from Spark. So you cannot use Spark SQL to create or work with these tables. Yet.
- It is supported only for Managed tables
Apache Hive supports transactions — which means you can…
As a data scientist, you would want to concentrate on the business logic of your program and not be worried about the stability and availability of the compute engine that runs your application. In an ideal world. Practically speaking, there may be several dependencies of the infra that can roll up and cause disruptions to your Spark or Hadoop jobs.
That is where infusing monitoring with alerts and notifications plays a key role to building a solid, enterprise grade application system. A typical organization has several environments across…
Amadeus, the travel technology company set up by a group of European airlines to enable travel agents to carry out flight ticketing online, has turned to a NoSQL database technology to enable travellers to ask complex questions about their journey. Travel site Kayak is using Amadeus Instant Search technology to increase conversion rates from “looking” to “booking”.
The company chose NoSQL database MongoDB to help it build an “instant search” application that can browse billions of travel options across multiple criteria in real time.
Above are excerpts from an article on ComputerWeekly.com that gives an interesting take on why the…
This story is based on a customer use case — how you can combine Serverless in conjunction with Managed Services for constructing analytic workflows. We’ll see the need for such a requirement, and the steps to go about it.
“You should use functions as the glue, containing your business logic, between managed services that are providing the heavy lifting that forms the majority of your application.”
The benefits of using private endpoints in IBM Cloud are three fold:
1. Your data does not pass through the public network, so it is more secure.
2. You get better performance
3. Overall, there is a saving on cost incurred if the data transfer is internal.
This article demonstrates how you can architect a solution around private end points as far as possible, to leverage these benefits.
This article and this diagram concerns itself with the upstream and downstream services usually associated with IBM Analytics Engine. In the picture, there are two different instances of Analytics Engine. The Analytics…
Safeguarding business data is one of the critical pieces of any application design. It is important that that data is accessed safely and securely and only by authorized credentials. IP whitelisting is an additional tenet of security that can be implemented to allow only trusted hosts to access your data.
This writeup discusses how you can whitelist the IP Addresses (both private and public) of your IBM Analytics Engine cluster to access your data stored in either On-Prem or in IBM Cloud Object Storage.
In this blog you will learn how and why to make your IBM Analytics Engine (1.2) cluster stateless by keeping your data and (hive) metadata outside of the cluster. We use IBM Cloud Object Storage and Databases For PostGreSQL
Separating storage from compute is a recommended paradigm that brings in flexibility and optimization of resources. The decoupling allows you to scale up (or scale down) either of the two, independently, without impacting the other. Specifically in the case of IBM Analytics Engine, it allows you to get into the cattle-vs-pets way of thinking. Need to move from a cluster on…
In this tutorial you will understand how analytics can be performed on a shared compute engine and shared cloud object storage but with separate jobs and job configurations. This solution leverages the Analytics Engine and Cloud Object Storage services to perform spark analytics in separate contexts, whilst sharing the underlying engine and storage instances.
This is a fictional scenario of an analytics team (Sportify Inc) that has two data scientists and a data engineer. To save on costs and administration, the company wants to use one instance of the processing server and a common data lake for all of their…
Senior Consultant, IBM Cloud. Sharing titbits of epiphanies...