Overview

Livepeer's network data is fragmented across isolated systems, making it impossible to answer basic questions about network performance, AI model adoption, or orchestrator health. This hackathon project takes the first step toward democratizing network analytics by building a unified data pipeline that anyone can query.

<aside> 🎯

What We're Building: A proof-of-concept pipeline using StreamR → ClickHouse → Metabase that demonstrates how network-wide data can be collected, stored, and made accessible to the entire Livepeer community.

</aside>

This isn't just about better dashboards—it's the foundation for the Livepeer Data Lake, where orchestrators, gateways, developers, and researchers can all access the insights they need to optimize the network.

Problem Statement

<aside> ⚠️

Our core problem is that answering simple questions like: "How many AI jobs have been run in the last 24 hours for each model?" is impossible at a network-level.

</aside>

The Data Silo Reality

Our network data is scattered across isolated islands:

Server logs sitting on individual machines, inaccessible to most
Inc-only datasets locked behind company boundaries
Orchestrator performance data buried in a Vercel database
Stream metrics trapped in Grafana with PromQL complexity
Gateway statistics in yet another system

What This Actually Means

We're flying blind on fundamental network questions:

Which AI models are seeing the most adoption?
How is orchestrator performance trending across the network?
What's the relationship between gateway load and stream quality?