Connecting To Streambased From Jupyter

A step-by-step guide to integrating Jupyter with Streambased, unlocking powerful capabilities for interactive data exploration and analysis on streaming data.

Integrate With Your Analytical Tools
June 10, 2024
You must have a running Streambased server before following this guide.

For details on how to run Streambased see the documentation here: https://streambased-io.github.io/streambased/index.html Β or run one of the demos here: https://github.com/streambased-io/streambased-demos


Step 1: Install dependencies


Streambased requires the following python packages:


pip install jupyterlab
pip install jupysql
pip install sqlalchemy-trino


Step 2: Start the notebook


Launch a notebook directly with:


jupyter lab


Step 3: Load the SQL extension#


From your notebook load the SQL extension:


%load_ext sql


Step 4: Connect to Streambased


Next connect to your Streambased Server


%sql trino://localhost:8080/kafka


Note: This assumes that your Streambased Server is running locally, adjust the host and port according to your deployment.


Step 5: Run a query


Now we can run a query:


%sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'



Step 6: (optional) Pandas?


To use Streambased with pandas first return to step 1 and ensure you have the pandas library installed:


pip install pandas


Then to use pandas simply change your query result into a DataFrame and work away:


result = %sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'
df = result.DataFrame()



