𝗗𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝗮𝗻 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝗧𝘆-𝗗𝗿𝗶𝘃𝗲𝗻 𝗗𝗮𝗍𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗳𝗼𝗿 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗔𝗻𝗮𝗹𝘆𝗍𝗶𝗰𝘀
You want to build a data pipeline that provides real-time analytics. To do this, you need to design a system that can handle high-velocity events, process them with low latency, and provide operators with actionable insights.
Here are the key components:
- Ingest layer: streaming source adapters like Kafka or Kinesis
- Processing layer: stream processing for aggregations and enrichment
- Storage layer: immutable event store for replayability and read-optimized stores for analytics
- Serving/query layer: materialized views and pre-aggregated tables
- Observability layer: tracing, metrics, logs, dashboards, and alerting
To get started, define your requirements:
- Ingestion rate: 100k events per second
- End-to-end latency: ≤ 300 ms
- Query patterns: time-bounded aggregations and SLOs for tail latency
Choose your observability outcomes:
- Sufficient telemetry to diagnose latency and data skew
- Quick root-cause analysis for outages