How Streambased works
A Unified Logical Layer For Data with no ETL
Streambased makes real-time and historical data behave like a single Iceberg table or a single Kafka topic, eliminating ETL, preserving performance, and unifying streaming + batch workloads on one logical layer.

Streambased Platform consists of 2 services:

Surfacing Kafka data as Iceberg
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

Surfacing Iceberg data as Kafka
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
I.S.K. - One Table Across All Time
Streambased I.S.K. presents a set of Iceberg tables composed of a section of real-time data from Kafka (the “hotset“) and a section of physical Iceberg data (the “coldset“).
Tables in I.S.K. combine these two sections in a way that is completely transparent to any clients interacting with it (it just looks like a regular Iceberg table).
The I.S.K. architecture consists of the following components:
A Storage Gateway
Iceberg is expecting files so I.S.K. must have a way to provide a file based interface to engines. I.S.K. presents an Amazon S3 compatible API to engines that can serve both metadata and data files with data sourced from Kafka.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
An Iceberg Catalog
I.S.K. presents a simple, read only, catalog for Kafka data, this is the entrypoint for Iceberg engines.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
A Cache
To reduce impact on the Kafka cluster and improve Iceberg performance, I.S.K. caches files served by the storage gateway. These files represent sections of immutable Kafka log and so can be cached and invalidated at will.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
An indexing engine
Most Iceberg queries will not address the entire dataset. The Kafka API does not allow access patterns that easily address subsets of data. To address this I.S.K. maintains indexes that map Iceberg partitions -> Kafka offsets, making Iceberg engines able to prune away the Kafka data they do not need.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

K.S.I. - One Stream Across All Time
Streambased K.S.I. presents Kafka topics composed of a “hotset” section of data served directly from Kafka and a “coldset” section served from Iceberg.
Kafka’s partition and offset concepts are mapped from columns in the Iceberg data allowing Kafka clients to interact with them as if they were Kafka topics.
The K.S.I. architecture consists of:
An Iceberg Engine
Required to fetch table formatted data
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
A Row Processor
This component reformats the column oriented Iceberg data into the key/value based messages Kafka clients expect. Governance steps like Schema Registry integration are applied here too.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.
A Proxy (we use the open source Kroxylicious)
Most requests/responses will be passed through to the underlying Kafka cluster but fetch requests that reference cold stored Iceberg data will be served by K.S.I. and not the underlying cluster.
Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

Let’s find the right solution for your data
We’re here to help you unlock the full potential of your streaming data. Tell us about your challenges or ideas — and let’s explore how Streambased can support your business.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.