Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
Starting with Fusion 5.0.2, we provide a Jupyter service that can be run from the Fusion Helm chart. See How to enable Jupyter.
|This feature is not available in Managed Fusion environments.|
Jupyter is evolving, and it has a number of kernels that support different programming languages. With Jupyter, you can run Spark in Scala or Python and run Fusion SQL at the same time in Python. This versatility gives you options to debug and try out various Fusion features. By integrating BeakerX, we get access to a wide variety of kernels and visualization features. And by configuring Jupyter, we can hide it behind the Fusion gateway (proxy) service and avoiding exposing the Jupyter IP externally.
What this service is for
Run/Debug Spark code in Scala/Python (Replacement for spark-shell)
Run SQL queries via Fusion SQL
Debug Scala and SQL transforms in PBL jobs
Everything else for which Jupyter is designed
What this service is not for
This service is not for running Spark jobs in a Kubernetes cluster.
The Jupyter pod does not have access to create or delete pods in a Kubernetes cluster, and therefore you cannot run jobs from Jupyter notebook in a Kubernetes cluster. However, Spark local mode can be used with a higher driver memory if needed.
We recommend sampling data for debugging Jupyter and then running the actual job using the Fusion jobs UI.
How to enable Jupyter
Jupyter can be enabled with the Fusion Helm chart. It is not enabled by default.
Add the following to your custom
fusion-jupyter: enabled: true
Verify that Jupyter is available at
Run the following:
helm upgrade <release-name> <helm-repo>/fusion --values values.yaml --version 5.2.0
Be sure to specify your version of Fusion.
The Jupyter pod IP is
ClusterIPand not exposed externally. We recommend accessing it via the Fusion proxy and not exposing the pod IP via
Jupyter can also be accessed via port-forwarding and available via
Idle notebooks and kernels are shutdown after 60 mins of inactivity. Note that any additional libraries installed via
pipwill need to be re-installed when the pod is recycled.
This image is baked with Python 3.6+.
Customers who run JupyterHub can use
lucidworks/fusion-jupyter to launch notebooks with the Fusion Jupyter image.