Three Sleep Intervals for Three APIs
I built ETL pipelines for three directory sites in April. Each site uses a different API: Steam, GitHub, and HuggingFace.
I had to set sleep intervals for each one. The numbers, failure modes, and error handling are all different. Here is what I use and why.
Steam: 250ms sleep
Steam documentation is vague about rate limits. Community data suggests roughly 200 requests every 5 minutes per IP. This means a 1.5 second interval is safe.
I use 250ms instead. My nightly job only processes 60 game entries. At 250ms, the total sleep time is 15 seconds. At 1.5 seconds, it becomes 90 seconds. Saving time matters when you process multiple sites.
If Steam returns an error, the job does not stop. It logs the error and moves to the next item. The data is updated the next night.
GitHub: 100ms sleep
GitHub is very clear. Unauthenticated users get 60 requests per hour. Users with a token get 5,000 requests per hour.
I use a 100ms sleep as a politeness measure. The token does the heavy lifting for the rate limit. My pipeline uses the core REST API, not the search API. This allows for much higher limits.
HuggingFace: No sleep
I have not hit a rate limit in weeks of nightly runs. The registry API is designed for batch tools like mine.
I fetch up to 100 models at once. I use an authentication token to raise the limits even higher. For 100 models, no sleep is the simplest solution.
Summary Table:
• Steam: 250ms sleep. Non-fatal errors. • GitHub: 100ms sleep. Non-fatal errors. • HuggingFace: No sleep. Non-fatal errors.
The sleep interval is a guess. The real protection is how I handle errors. Every API call uses a try/catch block. If a call fails, the system writes a fallback row instead of crashing.
The sleep interval controls how often you hit a limit. The error handling controls what happens when you do.
Optional learning community: https://t.me/GyaanSetuAi
