Dealing With Data Fragmentation

Written by

Brenton Ough

July 22, 2021

Share on

Dealing With Data Fragmentation

One of the biggest benefits of streaming over broadcasting is access to data. Every component within the workflow throws off some kind of information which can be tapped into, visualized, and used to make business decisions. Sometimes this data is free and clear, available through an API. Other times, it’s contained in a black box and available only through the vendor’s visualization tools. Regardless of how it can be accessed, though, it’s there, waiting to be drawn into the business intelligence tools through which you make smarter decisions. 

But hold on just a second. As you may already know, it’s not quite that easy. Sure, you can get access to a bunch of different data sources, but that’s the issue: they are all different. Different schemas, different names, different ways to store and order. And these differences can create a world of problems.

When Data Sources Clash

Imagine the following scenario: you’re running a customer-service focused business, such as a streaming platform. When there are issues in the service that you provide, say trouble delivering a video, two things are essential: to understand the issues fast and to be able to resolve the issues even faster.

Sound familiar? Now imagine this added wrinkle. The data required to complete these two steps comes from dozens of different sources. In this case, operations engineers, tasked with ensuring a high-performing streaming service, must spend valuable time “connecting the dots” rather than addressing the issues and optimizing the streaming components.

The outcome? Churn goes up, revenue goes down (especially ad revenue as less eyeballs equals less impressions). 

The first challenge in this scenario, getting and understanding the data quickly, can be handled in a number of ways such as high-speed data acquisition tools and protocols (like WebRTC). But the second issue, which is being able to resolve service-impacting problems as fast as possible, is hampered by the wasted resources post-processing data to normalize multiple sources into a cohesive “source of record”. If operations engineers are having to sift through multiple sources to understand how they connect (and the data points by which they should connect, when those data points might be differently named or calculated differently), they can’t also be identifying the causes and moving to solve them at the same time. So what can you do? Create a single data source!

Paving the Way to a Single Data Source

Since wrestling with multiple data sources creates a problem with resolving issues quickly, it should be your number one priority to fix it. Doing so, however, requires a different way of thinking.

evolution-of-streaming-observability-touchstream-infographicThere has been an evolution in observability throughout the years. First, it began with using a variety of different visualization tools. When data sources weren’t available programmatically and the vendor forced the customer into a visualization tool, operations engineers looked at dozens of screens to try and find the pattern. 

Next, vendors began to open up their technologies to programmatic access meaning that operators were no longer required to use their proprietary visualization tool unless they wanted to. As a result, savvy streaming operators began to utilize third-party tools like Datadog, Tableau, and Looker to build their own custom dashboards against these programmatic datasets. But this stage of improving observability still had to deal with data sources that weren’t aligned. Even as operators were dumping all of the data sources into a data lake against which their new visualization tools could operate, they were still contending with conflicting data.

Finally, we arrive at today. Streaming operators are using data lakes. They are pulling in all of their data programmatically. They are using customized visualization tools. What’s different, though, is that they are connecting it all to a monitoring harness.

The Key to Resolving Data Fragmentation?

See, the issue with data fragmentation isn’t just the speed at which issues can get resolved. Yes, that’s critical, but it’s more so the ripple effect. As technologies in the workflow need to be upgraded or changed, they might require new visualization tools. That requires training to teach not only how to use the visualization tools but also how to enable observability as well as how to connect the data points together again. 

A monitoring harness, though, is a much different approach. It’s the idea of creating a monitoring pipeline within an OTT workflow in which all of the components connect programmatically. So, when a component needs to be swapped out or upgraded, it is simply connected to the harness. And at the end of the harness is the same visualization tool, with no new training needed. What’s better? The only adjustments which need to be made are within the logic of the harness: ensuring the data that comes from the connected component is normalized. The result is much more efficient streaming as operations engineers gain the observability they need, at the speed they need it, without any concern about the stack underneath.

Don’t Be a Victim of Data Fragmentation

The standardization of data within the streaming industry is a long way off. As such, streaming operators will contend with data fragmentation for the foreseeable future. There will always be work to be done connecting data sources together so that observability can happen. But the main question is where that work is completed. 

If it’s at the top of the monitoring stack, with customized visualizations and logic to normalize data, the impact of new technologies in the workflow will always slow down the ability for operations to ensure the highest quality streaming experience. But if it’s within a harness, software engineers can ensure datasets connect within the data lake at the time of API connection. As a result, operations engineers will always use the same tool, the same familiar dashboards, and the same comfortable processes so that viewers always receive the same great service.

Popular posts

The Day Your Streaming Video Workflow Meets the Cloud

May 31, 2021
Share on

Streaming Cloud Workflows and Monitoring

November 10, 2020
Share on

Dealing With Data Fragmentation

July 22, 2021
Share on

Continual Transformation Is a Natural by-product of Delivering Streaming Video

March 31, 2021
Share on

The Word of the Day Is Interoperability

June 11, 2021
Share on