Pre-requisites
β
You must have a running Streambased server before following this guide.
For details on how to run Streambased see the documentation here: https://streambased-io.github.io/streambased/index.html Β or run one of the demos here: https://github.com/streambased-io/streambased-demos
β
Step 1: Install dependencies
β
Streambased requires the following python packages:
β
pip install jupyterlab
pip install jupysql
pip install sqlalchemy-trino
β
Step 2: Start the notebook
β
Launch a notebook directly with:
β
jupyter lab
β
Step 3: Load the SQL extension#
β
From your notebook load the SQL extension:
β
%load_ext sql
β
Step 4: Connect to Streambased
β
Next connect to your Streambased Server
β
%sql trino://localhost:8080/kafka
β
Note: This assumes that your Streambased Server is running locally, adjust the host and port according to your deployment.
β
Step 5: Run a query
β
Now we can run a query:
β
%sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'
β
β
Step 6: (optional) Pandas?
β
To use Streambased with pandas first return to step 1 and ensure you have the pandas library installed:
β
pip install pandas
β
Then to use pandas simply change your query result into a DataFrame and work away:
β
result = %sql SELECT * FROM kafka.streambased.customers WHERE name = 'TOM SCOTT'
df = result.DataFrame()
β
β