The benefits of using private endpoints in IBM Cloud are three fold:
1. Your data does not pass through the public network, so it is more secure.
2. You get better performance
3. Overall, there is a saving on cost incurred if the data transfer is internal.
This article demonstrates how you can architect a solution around private end points as far as possible, to leverage these benefits.
This article and this diagram concerns itself with the upstream and downstream services usually associated with IBM Analytics Engine. In the picture, there are two different instances of Analytics Engine. The Analytics Engine #1 has only public endpoints and the Analytics Engine #2 has only private endpoints. Bear with me as we zig zag around this diagram, not necessarily in the alphabetical order of arrows.
- To connect to the public Analytics Engine, cluster, the arrow (c) indicates that your jobs (for e.g- Spark Livy Jobs or Jupyter Notebooks) can be submitted from your on-prem servers or even from your Mac over the public network.
- All the downstream services that are typically used by IBM Analytics Engine (such as IBM Cloud Object Storage for storing data, PostGreSQL for storing the Hive metastore and the Log Analysis for DNA for externalizing the application logs) are recommended to be provisioned with private endpoints. Arrows (g)(h)(i)(j)(k)(l) indicates that all of that communication, whether from a public IAE or private IAE instance should be internal, within the IBM network.
- For the private endpoint instance of IBM Analytics Engine#2, you can submit jobs either from Watson Studio service (e)(using notebooks) or (say) execute Spark Livy jobs from a Virtual Server machine(f).
In both these cases, however, you will need to connect to these upstream components via the public network (a)(b).
- From Watson Studio, you can connect to the Analytics Engine #1 over public network (d) or as mentioned previously, connect to Analytics Engine#2 over the private (e).
- For (b), you can enable security group rules for the Virtual Server instance to allow access only from specific IPs to tighten the security a bit more. For example in the snapshot below, a security group has been defined which allows access only on port 22 and only from a specific IP Address
Quick Tip: If you need to access the GUI of the Ambari interface of your private instance of Analytics Engine, you can install a VNC Server in the Virtual Server Instance machine and access it from outside.
To get started with provisioning Analytics Engine clusters with private endpoints (also known as Cloud Service Endpoints or CSE), checkout the instructions here.
To set the security groups of a Virtual Server Instance refer to the instructions here.
Be Private, Stay Isolated. Stay Safe.