Whitelisting Analytics Engine IPs : Controlling access to external and internal data

Mrudula Madiraju
6 min readMar 28, 2020

Overview

Safeguarding business data is one of the critical pieces of any application design. It is important that that data is accessed safely and securely and only by authorized credentials. IP whitelisting is an additional tenet of security that can be implemented to allow only trusted hosts to access your data.

This writeup discusses how you can whitelist the IP Addresses (both private and public) of your IBM Analytics Engine cluster to access your data stored in either On-Prem or in IBM Cloud Object Storage.

PART 1: ANALYTICS ENGINE API TO GET THE PUBLIC & PRIVATE IPs

Before we get started with whitelisting, first let us get the Public & Private IPs of the Analytics Engine cluster. Use the Get Details of Analytics Engine API to get the details. You will need to pass the instance GUID of your Analytics Engine service.

curl --request GET \
--url https://api.us-south.ae.cloud.ibm.com/v2/analytics_engines/<<instance_guid>> \
--header 'authorization: Bearer eyJraWQiOiIyMDIwMDMyN<<...>>' \
--header 'content-type : application/json'

The resulting JSON will contain the public and private IPs. A truncated snapshot is shown below. As you can see the public IP of the management host (169.X.Y.Z) and data host (169.A.B.C) can be obtained from that. (Note that all the three management nodes — mn001, mn002 and mn003 all are running on the same underlying management VM so they have the same IP). You can also note down the private IPs starting with 10.

PART 2: WHITE LISTING OF PUBLIC IPs OF ANALYTICS ENGINE TO ON-PREM DATA

a. Simulating OnPrem Database

For the purpose of this demo, the following steps were executed:

  • Provisioned a basic EC2 instance on AWS
    - Installed docker and brought up a mysql container
    - Created a DB “test” and table called “employees”, added some data to it
    - From the EC2 instance I can access the table via mysql client

b. Allowed incoming access to port 3306

By default, for EC2 instances, access to incoming is denied. So, now, inbound rules has been modified to allow all incoming traffic from *any* IP to port 3306. This is done by selecting 0.0.0.0/0

c. Accessing this data from Analytics Engine instance #1

From an IBM Analytics Engine instance # 1 “chs-ydt-xxx-mn003.us-south.ae.appdomain.cloud”, I can access the MySQL DB running on external (on-prem) using the mysql client.

d. Accessing this data from Analytics Engine instance #2

Similarly, from a different IBM Analytics Engine instance # 2 “chs-dfh-yyy-mn003.us-south.ae.appdomain.cloud”, I can access the MySQL DB running on external on-prem using the mysql client.

e. Restricting access to one IPs of one Analytics Engine only

So far, from both the different Analytics Engine instances we can access the MySQL DB on prem. Next, we edit the inbound rules to allow only the public IPs of the Analytics Engine instance #1

f. Access denied for the second instance of Analytics Engine #2

After this rule is applied at the destination, the first instance can still access the MySQL DB. However, on the second instance, this error is thrown. So we have successfully whitelisted access to the (simulated) “OnPrem” DB to only the specific public IPs of Analytics Engine.

PART 3: WHITE LISTING OF PRIVATE IPs OF ANALYTICS ENGINE TO IBM COS

When you are connecting from Analytics Engine to IBM COS, it is strongly recommended to use the private endpoint of COS for better performance and for saving on cost incurred. So for this scenario, we need to whitelist the private IPs. This section talks about how you can use the Private IPs of the Analytics Engine instance obtained from the previous step to whitelist access to your bucket.

a. Accessing COS on the private endpoint from Analytics Engine #1

Sample Scala code snippet that accesses COS from Analytics Engine #1

b. Accessing COS on the private endpoint from Analytics Engine #2

Sample Scala code snippet that accesses COS from Analytics Engine #2

As you can see, by default, both instances are able to access the COS data.

c. Whitelisting Private IPs against bucket in COS

IBM COS has fine grained access control policies. One can use the HMAC authentication or IAM Key based authentication. And you can restrict access control down to a bucket level with varying levels of access such as reader/writer etc. Whitelisting of IPs that can access the bucket is another layer of security. You can read all about it here.

As you can see below, we have added the private IPs of only the first instance to be able to talk to bucket “matrix”. This will deny access to all other IPs except those specified.

d. Access denied for the Analytics Engine instance #2

This time when accessing the bucket from the second instance, you will get a 403 Forbidden exception.

CONCLUSION

Analytics Engine exposes both the public and private IPs that can be used to whitelist in the firewalls or IP table rules to restrict access to your data.

Note that the IP Addresses are specific to each cluster. Each time you provision a new cluster, you will need to obtain the IP addresses and whitelist them again. There are no fixed ranges of IP Addresses that are specific to all Analytics Engine instances that are provisioned and it will keep changing.

If you have long running clusters, making manual changes for whitelisting once in a while should be easy. For frequently created & discarded clusters, you can programmatically get the IPs from Analytics Engine provisioned, and set them up for whitelisting.

Just like we have demonstrated inbound rules from specific instance of Analytics Engine to the on-prem resources, depending on your organization policies, you can also configure outbound rules from on-prem to access only the Analytics Engine instance of your interest.

Stay Secure. Stay Safe.

--

--

Mrudula Madiraju

Dealing with Data, Cloud, Compliance and sharing tit bits of epiphanies along the way.