We are speaking at Kafka Summit!

Connecting to Streambased
from Jupyter

A step-by-step guide to integrating Jupyter with Streambased, unlocking powerful capabilities for interactive data exploration and analysis on streaming data.
Pradeep Sekar
Devrel
31/01/2024

Table of Contents

Share This Tutorial
Facebook
Twitter
LinkedIn

Pre-requisites

You must have a running Streambased server before following this guide.

For details on how to run Streambased see the documentation here: https://streambased-io.github.io/streambased/index.html  or run one of the demos here: https://github.com/streambased-io/streambased-demos

Step 1: Install dependencies

Streambased requires the following python packages:

				
					pip install jupyterlab
pip install jupysql
pip install sqlalchemy-trino

				
			

Step 2: Start the notebook

Launch a notebook directly with:
				
					jupyter lab
				
			

Step 3: Load the SQL extension

From your notebook load the SQL extension:

				
					%load_ext sql
				
			

Step 4: Connect to Streambased

Next connect to your Streambased Server
				
					%sql trino://localhost:8080/kafka
				
			
Note: This assumes that your Streambased Server is running locally, adjust the host and port according to your deployment.

Step 5: Run a query

Now we can run a query:

				
					%sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'
				
			

Step 6: (optional) Pandas?

To use Streambased with pandas first return to step 1 and ensure you have the pandas library installed:
				
					pip install pandas
				
			
Then to use pandas simply change your query result into a DataFrame and work away:
				
					result = %sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'
df = result.DataFrame()

				
			
Share This Tutorial
Facebook
Twitter
LinkedIn