How Streambased works

A Unified Logical Layer For Data with no ETL

Streambased makes real-time and historical data behave like a single Iceberg table or a single Kafka topic, eliminating ETL, preserving performance, and unifying streaming + batch workloads on one logical layer.

Streambased Platform consists of 2 services:

Surfacing Kafka data as Iceberg

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

Surfacing Iceberg data as Kafka

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

I.S.K. - One Table Across All Time

Streambased I.S.K. presents a set of Iceberg tables composed of a section of real-time data from Kafka (the “hotset“) and a section of physical Iceberg data (the “coldset“).

Tables in I.S.K. combine these two sections in a way that is completely transparent to any clients interacting with it (it just looks like a regular Iceberg table).

The I.S.K. architecture consists of the following components:

A Storage Gateway

Iceberg is expecting files so I.S.K. must have a way to provide a file based interface to engines. I.S.K. presents an Amazon S3 compatible API to engines that can serve both metadata and data files with data sourced from Kafka.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

An Iceberg Catalog

I.S.K. presents a simple, read only, catalog for Kafka data, this is the entrypoint for Iceberg engines.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

A Cache

To reduce impact on the Kafka cluster and improve Iceberg performance, I.S.K. caches files served by the storage gateway. These files represent sections of immutable Kafka log and so can be cached and invalidated at will.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

An indexing engine

Most Iceberg queries will not address the entire dataset. The Kafka API does not allow access patterns that easily address subsets of data. To address this I.S.K. maintains indexes that map Iceberg partitions -> Kafka offsets, making Iceberg engines able to prune away the Kafka data they do not need.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

K.S.I. - One Stream Across All Time

Streambased K.S.I. presents Kafka topics composed of a “hotset” section of data served directly from Kafka and a “coldset” section served from Iceberg.

Kafka’s partition and offset concepts are mapped from columns in the Iceberg data allowing Kafka clients to interact with them as if they were Kafka topics.

The K.S.I. architecture consists of:

An Iceberg Engine

Required to fetch table formatted data

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

A Row Processor

This component reformats the column oriented Iceberg data into the key/value based messages Kafka clients expect. Governance steps like Schema Registry integration are applied here too.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

A Proxy (we use the open source Kroxylicious)

Most requests/responses will be passed through to the underlying Kafka cluster but fetch requests that reference cold stored Iceberg data will be served by K.S.I. and not the underlying cluster.

Keep dashboards and reports aligned with live data. Every Kafka topic is instantly available in Iceberg, so teams can query fresh events without waiting for pipelines to finish.

Let’s find the right solution for your data

We’re here to help you unlock the full potential of your streaming data. Tell us about your challenges or ideas — and let’s explore how Streambased can support your business.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.