Logging in Serverless Spark (Part1)

What can logs tell you about user errors (and output)?

Introduction

Analytics Engine in cloud.ibm.com is a managed service that gives you a way to quickly submit spark applications without having to bother setting up and installing Spark. When you submit a Python, Scala or R application, a Spark cluster gets launched in the background, executes your code against the given workload and you get charged only for the resources that your application uses. Learn how to create an Analytics Engine Serverless Spark instance and submit a Spark application here.

If you don’t have the time, just read the quick summary.

Enabling Logging for Serverless Spark

When you submit an application, obviously you want to see the associated logs. For this, the first step is to enable LogDNA service to work with Analytics Engine. Setting up Logging for Analytics Engine — Serverless Spark

Once you submit the application, you can filter by instance id or application id in the bottom filter box and see the logs that you are interested in. In the following examples, I have filtered by the instance id.

Case1: Quick Start WordCount Application — Good Case

One of the first applications you will try out on AE-Serverless Spark service is the inbuilt word count spark application, following the steps here.
The application, obviously prints out the count of words from the example file. See how you would see it.

Now let’s look at some error cases:

Case2: Submitting application where file does not exist in the location

Case3: Invalid Syntax in submitted python file

Case4: Customization Failure

Customization is the feature using which you can setup Python, R and other packages to be used against the Spark applications. For example, you can specify pip or conda as the package manager depending on the package that you want to install. In this case, I used conda instead of pip and the error tells me that the package is not available.

Case4: Customization Success — Good Case

Case5: df.show()
Here’s a simple application that shows a dataframe and also prints some tracing statements from user.

See how it shows up in the logs:

Case6: Wrong COS endpoint when submitting spark application

Case7: Wrong COS Credentials when submitting spark application

Conclusion

This was a quick writeup to get you started on commonly faced errors and how you can correct them. The next article in this series will cover some more aspects of logging in Analytics Engine Serverless Spark.

--

--

--

Senior Consultant, IBM Cloud. Sharing titbits of epiphanies...

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mrudula Madiraju

Mrudula Madiraju

Senior Consultant, IBM Cloud. Sharing titbits of epiphanies...

More from Medium

Logging in Serverless Spark (Part3)

Data Replication (CDC) from Oracle to PostgreSQL using Debezium, Run in Docker & Exposed in Grafana

ksqlDB —real-time SQL magic in the cybersecurity scenario— part 1

Always Use Connection Pools — and How

Image showing two application processes, each with a session pool of connections with lines to a database tier. Each line has a server process. Some connections are show active, and some inactive. All are present and using resources, even when not in use.