Connect jupyter to kafka
β
A step-by-step guide to integrating Jupyter with Streambased, unlocking powerful capabilities for interactive data exploration and analysis on streaming data.
β
Pre-requisites
β
Install the following packages in your python environment
β
pip install jupyterlab
pip install jupysql
pip install sqlalchemy-trino
pip install pandas
β
Step 1: Start the notebook
β
Launch a notebook directly with:
β
jupyter lab
β
Step 2: Create Database Engine
β
From your notebook create a database engine using sqlalchemy.engine
β
from sqlalchemy.engine import create_engine
engine = create_engine("trino://streambased.cloud:8443/kafka",
connect_args ={"http_scheme":"https", "schema":"streambased"})
β
β
Step 3: Load the SQL extension
β
From your notebook load the SQL extension:
β
%load_ext sql
β
Step 4: Connect SQL engine to Database
β
From your notebook connect sql engine to database:
β
%sql engine
β
Step 5: Run a query
β
Now we can run a query:
β
%sql SELECT * FROM demo_transactions
β
Step 6: (optional) Pandas?
β
Change the query to pandas dataframe
β
transactions = %sql SELECT * FROM demo_transactions
df = result.DataFrame()
β
β
β