𝗖𝗲𝗻𝘁𝗿𝗮𝗹𝗶𝘇𝗲𝗱 𝗟𝗼𝗴𝗴𝗶𝗻𝗴 𝗪𝗶𝘁𝗵 𝗚𝗿𝗮𝗳𝗮𝗻𝗮 𝗔𝗹𝗹𝗼𝘆 𝗮𝗻𝗱 𝗟𝗼𝗸𝗶
Finding logs used to mean SSH-ing into specific servers and running grep.
During an incident, you often do not know which box holds the error. You end up hopping between hosts and trying to match timestamps by eye. This process fails when you need speed.
I moved to centralized logging. Now, every log line from every server lives in one place. I use Grafana Alloy on each host to ship logs to Loki.
Why this stack?
- ELK is heavy. Elasticsearch requires too much maintenance and hardware.
- Loki indexes labels instead of full text. This makes it cheaper and easier to run.
- SaaS tools get expensive quickly as your fleet grows.
- Alloy is the new standard agent from Grafana. It is efficient and reliable.
The Setup
Alloy reads files on each host. It adds labels like host, environment, and service. It then pushes these to Loki. I also built a Slack bot. It hits the Loki API so the team can pull logs without leaving their chat channel.
Real-world challenges I faced:
- Permissions: Alloy runs as its own user. If your app logs are restricted, Alloy fails silently. You must add the alloy user to your app groups.
- Mixed OS fleets: You will have Debian, Ubuntu, and RHEL boxes. You must use the correct package managers for each.
- Legacy agents: Old log shippers can cause double-shipping. You must find and remove them during rollout.
- Multiline logs: Java stack traces span many lines. Without a multiline regex, a single error becomes forty separate, useless entries.
The Golden Rule of Labels
Do not put high-cardinality data in labels. Never use request IDs or user IDs as labels. This will break your index. Use labels for things like service name or environment. Use filters for everything else.
The Result
Centralized logging turns logs into metrics. You can alert on error rates instead of waiting for a human to notice a problem. When an incident hits, the answer is one Slack command away.
Optional learning community: https://t.me/GyaanSetuAi
