𝗗𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝗮𝗻 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝗧𝘆-𝗗𝗿𝗶𝘃𝗲𝗻 𝗗𝗮𝗍𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗳𝗼𝗿 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗔𝗻𝗮𝗹𝘆𝗍𝗶𝗰𝘀

You want to build a data pipeline that provides real-time analytics. To do this, you need to design a system that can handle high-velocity events, process them with low latency, and provide operators with actionable insights.

Here are the key components:

  • Ingest layer: streaming source adapters like Kafka or Kinesis
  • Processing layer: stream processing for aggregations and enrichment
  • Storage layer: immutable event store for replayability and read-optimized stores for analytics
  • Serving/query layer: materialized views and pre-aggregated tables
  • Observability layer: tracing, metrics, logs, dashboards, and alerting

To get started, define your requirements:

  • Ingestion rate: 100k events per second
  • End-to-end latency: ≤ 300 ms
  • Query patterns: time-bounded aggregations and SLOs for tail latency

Choose your observability outcomes:

  • Sufficient telemetry to diagnose latency and data skew
  • Quick root-cause analysis for outages

Source: https://dev.to/therizwansaleem/designing-an-observability-driven-data-pipeline-for-real-time-analytics-4n8d