𝗔𝘀𝘆𝗻𝗰 𝗗𝗡𝗦 𝗥𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝘄𝗶𝘁𝗵 𝗮𝗶𝗼𝗱𝗻𝘀
You crawl thousands of video pages per hour. Latency adds up.
We thought bandwidth slowed us down. We were wrong. DNS was the bottleneck.
Python default DNS resolution is blocking. It uses a thread pool. When you have 500 lookups, threads run out. Your code waits.
We switched to aiodns. It is a wrapper for c-ares. It is non-blocking.
The results:
- p99 latency dropped by 40 percent.
- Median time to first byte fell from 180ms to 90ms.
- CPU use went up because the network stopped blocking.
How to do it:
- Install aiodns via pip.
- Use AsyncResolver in your aiohttp TCPConnector.
- Pin nameservers to 1.1.1.1 and 8.8.8.8.
- Set timeouts to 2 seconds.
- Use a cache with TTL respect.
- Use locks to stop redundant lookups for the same host.
- Log DNS errors separately from connection errors.
If your crawler is slow, use py-spy. Look for getaddrinfo in the top frames. If you see it, move to aiodns.