Looking to deploy on prem? Check out the demos here

Operational and Analytical data system convergence, the coming revolution that already happened

Exploring the transformative journey from distinct operational and analytical data systems to a unified, versatile approach in data infrastructure.

July 5, 2024
4 Minutes

The Evolution of Data Infrastructure

‍

In the world of data infrastructure, an operational system focuses on a short-term snapshot of the data needed by interactive clients. Your website, inventory, shipping, and account services are likely supplied by operational stores and indeed embed their own. These are the glue that makes the world go round.

‍

Operational vs. Analytical Systems

‍

Analytical systems focus on longer-term data and problems. Expect to find a much larger historical data set here that can be used for answering more strategic questions such as “what was my total revenue in 2021?” and “what is the average size of a shopping basket in Manchester?”.

This nice clear divide with specialist operational systems and specialist analytical systems lasted for all of 5 seconds as you’d imagine it would ;-). Analytical systems soon introduced caching and indexing features designed specifically to make them more suitable for operational workloads and conversely operational systems began expanding their storage capabilities and retaining data for longer periods.

‍

Blurring the Lines Between Systems

‍

To give a few examples, Apache HBase and Apache Impala represent big data analytical systems making moves towards the operational side, and ElasticSearch and Prometheus represent good examples of operational systems offering large-scale analytical capabilities too.

The ecosystem is now saturated with data storage and processing products to the point where a medium-sized organisation could expect to have many of these systems deployed with each claiming (usually truthfully) to be able to handle all of the business cases itself.

‍

A New Perspective on Data Infrastructure

‍

Truly we can no longer think in terms of analytical and operational systems but instead of data storage formats and processing technology. To some, this may be a revolutionary way of thinking but format-focused projects such as Apache Iceberg and Apache Hudi and processing-focused projects such as DuckDB and Apache Calcite demonstrate this shift has already happened. The revolution has been and gone.

‍

The Rise of Microservices and Event Streaming Systems

‍

Rest assured, there’s still a hill to climb. The rise in popularity of Microservices architectures (a paradigm where small, well-defined services coordinate with each other to solve business goals) pushed a further shift, these services communicate with each other via event streaming systems and these systems offer interesting new avenues for both data storage and processing.

‍

The Unique Benefits of Event Streaming for Analytical Purposes

‍

Event streaming systems are excellent at operational tasks, they convey events between services fast and reliably, but the benefits for analytical cases are more subtle. Data is exchanged as events rather than facts and the aggregate of all events on a given subject represent its current state.

‍

Introducing the Streaming Data Lake

‍

Here at Streambased, we are pioneering the Streaming Data Lake, the use of event streaming platforms (we love Apache Kafka) for analytical purposes. Recent Apache Kafka has introduced tiered storage, the ability to keep your data in Kafka indefinitely in a durable and cost-effective way.

‍

Streambased: Revolutionizing Data Analysis

‍

Streambased piggybacks on this new feature to provide the processing side of things, we surface Kafka data via SQL and JDBC into the tools and working practices analysts love. Finally, we add a good chunk of database performance tech learned from more traditional analytics stores, enjoy indexes, statistics, and pre-aggregation never before seen on top of raw Kafka data.

‍

The Future of Data Architecture

‍

What is clear from the above is that the revolution has begun, future architectures are thinking in much more general terms about data storage and processing and less about specialized, use case-based technologies. Where it leads remains to be seen but it’s an exciting journey we are thrilled to be a part of.

Experience lightning-fast filter queries with Streambased: achieve up to 30x speed boost!

Uncover the power of Streambased’s DataLake and unlock the potential for unparalleled efficiency and productivity. Learn more today!

Copyright 2024 Streambased Platform Limited. Company Number 14709247.