Sitemap

Unity Catalog API & Iceberg REST Catalog API in watsonx.data

7 min readJan 2, 2025

New REST APIs: Unity & Iceberg REST Catalog…

In watsonx.data 2.1 version, we have introduced support and implementation for two new sets of Open Source REST APIs. While native Spark Engines and Presto engine is configured to continue to talk to the HMS Thrift Interfaces to the metadata layer, the newly added Unity Catalog and Iceberg REST API interfaces are an alternative option that allows for both internal and external systems to interface with the metadata layer. This article is an overview of how you can leverage the new APIs in different ways using different types of clients.

  • Unity Catalog API — This one has created quite the buzz in the community ever since its launch a few months back. Databricks’ move to open source unity catalog has been welcomed by the industry for bringing in the benefits of openness, flexibility and interoperability. Unity Catalog OSS is the open source implementation of this API. Unity Catalog is currently a sandbox project with LF AI and Data Foundation (part of the Linux Foundation).
  • Iceberg REST Catalog API — The REST catalog was introduced in the Iceberg 0.14.0. Unlike the client side based catalogs (like the HiveCatalog and the HadoopCatalog) , the implementation logic of a REST catalog lives on the server side. For a service provider of the REST Catalog implementation, the server-side logic can be written in any language and use any custom technology, as long as the API follows the Iceberg REST Open API specification.

Here’s the API Spec, you can read to know more about the support and scope details in WXD

https://cloud.ibm.com/apidocs/watsonxdata-mds-unity https://cloud.ibm.com/apidocs/watsonxdata-ibm-mds-iceberg

Iceberg Rest API on WXD: Java Client Point of View

From the lens of the Iceberg community:

As Apache Iceberg expanded its reach beyond Java, APIs for other languages like Python, Rust, and Go were introduced. (See Apache Iceberg Go SDK, Apache Iceberg Rust SDK, PyIceberg) Practically speaking, the need to rebuild catalog client classes for each language surfaced several issues, including:

  • Duplicated efforts across languages: Each catalog implementation (e.g., HiveCatalog, HadoopCatalog, etc.) needed to be rewritten in every supported language (Python, Rust, Go, etc.).
  • Inconsistent behaviour: Inconsistencies began to emerge between how different Iceberg catalogs behaved in each language.
  • Diverging APIs: Catalogs in different languages often developed diverging APIs. For example, a feature or API method available in the Java client might not exist in the Python or Go clients, making it difficult for developers to move between languages while working with Iceberg.

With the introduction of REST Catalog Spec by Iceberg and consequent implementation by service providers, it brought the benefit of simplified client libraries it only need to implement HTTP requests to interact with the catalog. This simplifies the client-side logic and reduces the risk of API divergence. It has created a more scalable, maintainable, language-agnostic and efficient eco system.

From a WXD user point of view:

If you have been used to interacting with the watsonx.data metastore layer so far using the org.apache.iceberg.hive.HiveCatalog class in Apache Iceberg, you can still continue to do so. The metadata service continues to support the Hive Metastore API interface. With this interface you don’t have the overhead of an engine like Spark or Presto and can directly interface with the metadata layer.

However if you wanted to use the org.apache.iceberg.rest.RESTCatalog to leverage the new capabilities, you can do that as well. So functionally, what exactly does REST Catalog get you that is not there in HiveCatalog? Out of 24 APIs in the 1.6.1 spec of REST catalog — the most interesting are the following. The COMMIT API leads to interesting possibilities that was not there earlier.

/v1/{prefix}/namespaces/{namespace}/register - Register table using existing metadat location
/v1/{prefix}/namespaces/{namespace}/tables - Create table
/v1/{prefix}/namespaces/{namespace}/tables/{table} - Commit Updates to a table
/v1/{prefix}/namespaces/{namespace}/tables/{table} - Get the metdata of the table
+
/v1/{prefix}/transactions/commit - Commit Updates to multiple tables in an atomic transaction
Java Client to talk to Iceberg catalog — REST Catalog vs IcebergHiveCatalog

Here’s a detailed 3-part blog from my colleague

on how you can leverage the REST Catalog in watsonx.data.

Part1: https://medium.com/@hemant.marve/iceberg-rest-api-on-wxd-java-client-part-1-ac82028433b7

Part2: https://medium.com/@hemant.marve/iceberg-rest-api-on-wxd-java-client-part-2-4d3af16ae96c

Part3: https://medium.com/@hemant.marve/iceberg-rest-api-on-wxd-java-client-part-3-appending-data-files-23fa80e2eede

watsonx.data Iceberg REST API: Interoperability with Eco System

1) Connect to watsonx.data from Apache Spark on the Iceberg REST API

Just like how you can connect from Spark to HMS interface via thrift protocol for metadata operations, you can also connect to the Iceberg REST Catalog via the HTTPs interface. Service providers of this REST catalog implementation have the flexibility of how the metadata layer is implemented — be it a DB, HMS or some custom implementation.

If you have Apache Spark system outside of watsonx.data like Analytics Engine, or any other system, you can execute your Spark applications against metadata in watsonx.data. For the metadata it connects through the Iceberg REST Catalog using a bearer or basic authentication.

This blog from

has the details.

BLOG: https://medium.com/@hemant.marve/connecting-apache-spark-to-watsonx-data-using-the-iceberg-rest-api-f0bb4f0608cc\

External Spark connects to Iceberg REST Catalog on MDS/WXD

2) Connect to Snowflake Open Catalog from watsonx.data Spark

If you wish to execute Spark applications from watsonx.data against iceberg tables in Snowflake Open Catalog, you can do so by configuring your WXD Spark to connect to the Iceberg REST Catalog on Snowflake.

This blog from

has the details.

BLOG: https://medium.com/@hemant.marve/c2f72bd3d102

WXD Spark talks to Snowflake Open Catalog on Iceberg REST Catalog API

3) Query Iceberg Data from watsonx.data tables in Snowflake

If you have Iceberg tables in watsonx.data that you would like to import into Snowflake Catalog and run queries from Snowflake, you can setup an integration to bring the tables as an “external” table from WXD into the Snowflake Catalog. The integration is setup using the Iceberg REST API provided in WXD. This blog from

explains how to go about it.

BLOG: https://medium.com/@hemant.marve/59d7966f5e62

Sync Data from watsonx.data Iceberg tables into Snowflake Catalog

4) Join WXD tables and Databricks table from notebook in Databricks

Learn how to join Delta table created in Databricks and Iceberg table in WXD using a compute in Databricks and using a PySpark notebook.

Read all about it in

’s blog

https://medium.com/@hemant.marve/1defe3bb50cb

watsonx.data Unity Catalog API: Interoperability with Eco System

1. Apache Spark Interoperability

Interact from Apache Spark with Unity Catalog in MDS/WXD

You can interact with watsonx.data on the Unity Catalog API and execute spark applications from any external Spark. As of now, the vended credentials is supported for ADLS and GCS object storage systems. As of writing of this article, some of the DMLS like ALTER is not supported in the spark-unity client jar. This blog by my colleague, @anuragd916 explains the details with a sample application.

Blog: https://medium.com/@anuragd916/ibm-watsonx-data-integration-with-unity-catalog-simplified-2a17cf62b4c4

2. Databricks Interoperability

You can connect to Databricks from Spark Engine in watsonx.data and execute Spark SQL applications. This blog from my colleague

gives a practical example on how you can execute that. As of writing of this article, some of the DMLS like ALTER is not supported in the spark-unity client jar. So also CREATE TABLE is not supported due to current restrictions.

BLOG: https://medium.com/@dixonantony/connecting-databricks-from-watsonx-data-spark-engine-using-unity-catalog-open-apis-d9f7f7d1b636

Using Spark Engine from watsonx.data to Read Data from Databricks tables

Conclusion : The Beginning

The new API features implemented in watsonx.data is only the beginning of the story to promote open standards and interoperability in the eco-system. As the space evolves with more features and functionality, it will only strengthen the case.

Acknowledging contributions from colleagues: Hemant, Anurag, Althaf, Anjali, Shivangi, Dixon. Thanks to guidance and support from Gopi & Kulki

--

--

Mrudula Madiraju
Mrudula Madiraju

Written by Mrudula Madiraju

Dealing with Data, Cloud, Compliance and sharing tidbits of epiphanies along the way.

No responses yet