๐๐๐ถ๐น๐ฑ๐ถ๐ป๐ด ๐ฉ๐ถ๐ฑ๐ฒ๐ผ ๐๐ฒ๐ฎ๐๐บ๐ฎ๐ฝ๐ ๐๐ถ๐๐ต ๐๐๐ฝ๐ฒ๐ฟ๐๐ผ๐ด๐๐ผ๐ด
Counting unique viewers per second kills your database.
One person watching a 9-minute video creates 540 rows. One million viewers create millions of rows.
Most people use COUNT(DISTINCT). This is slow. It forces your database to sort everything every time you open the analytics tab.
Use HyperLogLog (HLL) in Postgres.
HLL gives a close estimate. It does not give a perfect number. For a heatmap, a close number is enough.
Why use HLL?
- Fixed size. Storage stays the same for one viewer or fifty million viewers.
- Mergeable. You combine regions or time buckets in milliseconds.
- Low error. A 1.6% error is invisible to the eye.
The Setup:
- Install the hll extension in Postgres.
- Store one sketch per video, region, and time bucket.
- Use a Python worker to batch writes.
- Cache the results at the edge.
The Trade-offs:
- No exact counts.
- No individual user data.
- No subtraction of sets.
The Result:
Your database stays small. Your analytics load in milliseconds. You save money on servers.
Source: https://dev.to/ahmet_gedik778845/building-video-heatmap-analytics-with-hyperloglog-in-postgres-42ah